I need to categorize a continuous variable in 4 classes each one with the same number of observations. I have used the function
cut(x, breaks = quantile(x,probs=seq(0,1,0.25)),include.lowest=TRUE,right=FALSE))
My problem is that the number of observations in each category is not exactly the same because there are observations (and more than one) which have exactly the same value of the quantiles. How can I do it?
My variable is waiting
[1] 79 54 74 62 85 55 88 85 51 85 54 84 78 47 83 52 62 84 52 79 51 47 78 69 74
[26] 83 55 76 78 79 73 77 66 80 74 52 48 80 59 90 80 58 84 58 73 83 64 53 82 59
[51] 75 90 54 80 54 83 71 64 77 81 59 84 48 82 60 92 78 78 65 73 82 56 79 71 62
[76] 76 60 78 76 83 75 82 70 65 73 88 76 80 48 86 60 90 50 78 63 72 84 75 51 82
[101] 62 88 49 83 81 47 84 52 86 81 75 59 89 79 59 81 50 85 59 87 53 69 77 56 88
[126] 81 45 82 55 90 45 83 56 89 46 82 51 86 53 79 81 60 82 77 76 59 80 49 96 53
[151] 77 77 65 81 71 70 81 93 53 89 45 86 58 78 66 76 63 88 52 93 49 57 77 68 81
[176] 81 73 50 85 74 55 77 83 83 51 78 84 46 83 55 81 57 76 84 77 81 87 77 51 78
[201] 60 82 91 53 78 46 77 84 49 83 71 80 49 75 64 76 53 94 55 76 50 82 54 75 78
[226] 79 78 78 70 79 70 54 86 50 90 54 54 77 79 64 75 47 86 63 85 82 57 82 67 74
[251] 54 83 73 73 88 80 71 83 56 79 78 84 58 83 43 60 75 81 46 90 46 74
which is in the dataset faithful in R. It has 272 observations, therefore it is divisible by 4 giving 68 observations in each category.
I have used
newwait<-cut(waiting, breaks =quantile(waiting,probs=seq(0,1,0.25)),include.lowest=TRUE,right=FALSE)
table(newwait)
newwait
[43,58) [58,76) [76,82) [82,96]
66 68 67 71
as you can see, the number of observations in each group is similar but not exactly the same.