3

Possible to select variables in a dataframe by those matching certain factor levels, selecting columns based on their factor levels (used or unused)? I can summarise by levels or subset possibly by rows, but I wondered if columns could be selected from the dataframe, or at least list variables/columns, that have certain factor levels.

library(dplyr)
height <- c(132,151,162,139,166,147,122)
    weight <- c(48,49,66,53,67,52,40)
    gender <- c("male","male","female","female","male","female","male")
    gender2 <- c("female","male","male","male","male","female","male")
    genderx <- c("xfemale","malex","malex","male","male","xfemale","xfemale")


    df <- data.frame(height,weight,gender, gender2, genderx) %>% 
      rowid_to_column(., "ID")

something like (or not like)

%>% select (vars(levels ==(c("male", "female")))
23stacks1254
  • 369
  • 1
  • 9

1 Answers1

4

We can use select_if

library(dplyr)
df %>% 
    select_if(~ is.factor(.) && all(c("male", "female") %in% levels(.)))

Or it can be any as well

df %>% 
    select_if(~ is.factor(.) && any(c("male", "female") %in% levels(.)))
akrun
  • 874,273
  • 37
  • 540
  • 662