Remove columns with factors that has less than 5 observations per level

Question

I have a dataset composed of more than 100 columns and all columns are of type factor. Ex:

          animal               fruit               vehicle              color 
             cat              orange                   car               blue 
             dog               apple                   bus              green 
             dog               apple                   car              green 
             dog              orange                   bus              green

In my dataset i need to remove all columns with factors thas has less than 5 observations per level. In this example, if i want to remove all columns with amount of observations per levels less than or equal to 1, like blue or cat, the algorithm will remove the columns animal and color. What is the most elegant way to do this?

in the example you have all columns shhowing 2 unique value each — akrun, May 14 '20 at 21:05

akrun · Accepted Answer · 2020-05-14T21:07:01.847

We can use Filter with table

Filter(function(x) !any(table(x) < 2), df1)
#  fruit vehicle
#1 orange     car
#2  apple     bus
#3  apple     car
#4 orange     bus

data

df1 <- structure(list(animal = structure(c(1L, 2L, 2L, 2L), .Label = c("cat", 
"dog"), class = "factor"), fruit = structure(c(2L, 1L, 1L, 2L
), .Label = c("apple", "orange"), class = "factor"), vehicle = structure(c(2L, 
1L, 2L, 1L), .Label = c("bus", "car"), class = "factor"), color = structure(c(1L, 
2L, 2L, 2L), .Label = c("blue", "green"), class = "factor")),
row.names = c(NA, 
-4L), class = "data.frame")

score 0 · Answer 2 · answered May 15 '20 at 00:39

0

We can use select_if from dplyr

library(dplyr)
df1 %>% select_if(~all(table(.) > 1))

#   fruit vehicle
#1 orange     car
#2  apple     bus
#3  apple     car
#4 orange     bus

answered May 15 '20 at 00:39

Ronak Shah

377,200
20
156
213

Remove columns with factors that has less than 5 observations per level

2 Answers2

data