I have the followinf DF and I want to create a dummy with automated scale to represent categorically whether a city has little, medium, or a lot of companies.
cities | sum of companies |
---|---|
CTY A | 199 |
CITY B | 358 |
CITY C | 250 |
CITY D | 1265 |
CITY E | 610 |
I tried the following code:
#install.packages("scales")
library(scales)
COMP_SCALES<- breaks_extended() #from packages Scales
COMP_A<-COMP_SCALES(df[2], n =4)
COMP_A <- cut(df[2],
breaks=c(-Inf, COMP_A[2],COMP_A[3],COMP_A[4], Inf),
labels=c("LITTLE","MEDIUM","A LOT OF","+ A LOT OF"))
However, the automatic calculated scale is not very suitable, once all the cities are on little range. How can I better automate this code?
The final porpuse is to create a table to better visualize the result with something like this:
COMP_A_CLUSTER <- as.data.frame.matrix(table(COMP_A,kmeans.k$cluster))
Expected outcome: City A Should be placed on the "Little". City B and C Should be placed on the "Medium". City E Should be placed on the "a lot of". City D should be placed on the "+ a lot of".
I have a list of more than 10,000 cities and more than 100 columns to do such a similar process and that is why I wanted the scale of the dummies to be calculated automatically.