I have a database with variables which are categorical and have a massive amount of categories.
I would love to recategorise it in less amount of categories in this case 2, and base the decision to place a category in one of the new based on the mean value they have on another variable.
When I have low amount of categories (in this case 10) I use this script
data$V152=as.numeric(data$V152)
data$V152=as.numeric(revalue(as.character(data$V152),
c("2"="0","3"="1", "4"="0","5"="1","6"="1","7"="0", "8"="0","9"="0","10"="0")))
But how do i do it with a categorical which has massive amount of categories ?
Looking at the picture I want the categories with a mean above to line to be recategorised as 1 and the others as 2.