I have a dataset composed of more than 100 columns and all columns are of type factor. Ex:
animal fruit vehicle color
cat orange car blue
dog apple bus green
dog apple car green
dog orange bus green
In my dataset i need to remove all columns with factors thas has less than 5 observations per level. In this example, if i want to remove all columns with amount of observations per levels less than or equal to 1
, like blue
or cat
, the algorithm will remove the columns animal
and color
. What is the most elegant way to do this?