5

(This question is probably a duplicate, but I can't find it being asked yet...)

Using dplyr techniques, how can I select columns from a data.frame by both names & values at the same time? For example the following (which doesn't work):

> data.frame(x=4, y=6, z=3) %>%
    select_if(matches('x') | mean(.) > 5)
Error: No tidyselect variables were registered

In base R, I would do something like this:

> df <- data.frame(x=4, y=6, z=3)
> df[names(df) == 'x' | colMeans(df) > 5]
  x y
1 4 6
Ken Williams
  • 22,756
  • 10
  • 85
  • 147
  • 1
    Related/Possible duplicate : https://stackoverflow.com/questions/55584714/how-to-select-columns-by-name-or-their-standard-deviation-simultaneously/ – Ronak Shah Apr 09 '19 at 15:01
  • Thanks @RonakShah, that indeed looks really close, though mine is a more distilled/abstract version of the question, and in the other Q they never really got to a nice clean answer like Andrew's. – Ken Williams Apr 09 '19 at 17:54

2 Answers2

7

Update: Using dplyr v1.0.0:

data.frame(x=4, y=6, z=3) %>%
      select(matches("x"), where(~mean(.) > 5))

Original answer: You could use select with a comma and colMeans

data.frame(x=4, y=6, z=3) %>%
  select(matches("x"), which(colMeans(.) > 5))
  x y
1 4 6
Andrew
  • 5,028
  • 2
  • 11
  • 21
  • 1
    Nice! I did not even think to try it like that. Went directly to `bind_cols` :) – Sotos Apr 09 '19 at 14:54
  • 1
    Cool, I didn't know (or forgot) that we could do `which(colMeans(.) > 5)` in a `select` clause. It would be great if we could get rid of the `which`, I wonder why a logical vector of the same length as the number of columns isn't allowed. – Ken Williams Apr 09 '19 at 14:59
1

We could use select_if to extract the column names based on the condiiton and use that in select for those column names matching 'x'

data.frame(x=4, y=6, z=3) %>% 
     select(matches("x"), names(select_if(., ~ mean(.x) > 5)))
#  x y
#1 4 6

NOTE: Here we are using select_if as the OP wanted an answer specificially with that. Otherwise, it can be done in many other ways

akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks. I wasn't specifically looking for a way to use `select_if`, just `dplyr` techniques in general. – Ken Williams Apr 09 '19 at 15:02
  • @KenWilliams The `colMeans` technique is `base R`. It is just a slight change in your original code with `base R` – akrun Apr 09 '19 at 15:03