5

I want to calculate the quintile of groups in a data.frame such as this:

df <- data.frame(x=1:100, y=c(rep("A", 50), rep("B", 50)))

Using the ntile() function and group_by from dplyr, I thought I could get the grouped quintiles such as here. However, as we can see from the table, the quintiles have been calculate with respect to the whole dataset. I would want to get a result where there is 10 for each quintile for A and B in this case.

df$z <- df %>% group_by(y) %>% mutate(z = ntile(x, 5)) %>% pull(z)

table(df$y, df$z)

     1  2  3  4  5
  A 20 20 10  0  0
  B  0  0 10 20 20
Cettt
  • 11,460
  • 7
  • 35
  • 58
Marco Pastor Mayo
  • 803
  • 11
  • 25
  • Your mutate statement already adds a column `z` to your data.frame, so it's not necessary to assign it to a new column. Instead, you can do `df <- df %>% group_by(y) %>% mutate(z = ntile(x, 5)) %>% ungroup()`. This will not solve your problem, but I think using `dplyr::mutate` will work. Probably the version of mutate your code is using, is coming from the `plyr` package. – Bas Dec 18 '19 at 12:31
  • cannot reproduce your example. For me your code works as intended. Maybe try starting a fresh R session. – Cettt Dec 18 '19 at 12:36

1 Answers1

7

make sure to start a new R-session and try this:

library(dplyr)
df <- data.frame(x=1:100, y=c(rep("A", 50), rep("B", 50))) %>% 
   group_by(y) %>% mutate(z = ntile(x, 5))

table(df$y, df$z)
     1  2  3  4  5
  A 10 10 10 10 10
  B 10 10 10 10 10

Also, a dplyr alternative to table would be count:

count(df, y, z)
Cettt
  • 11,460
  • 7
  • 35
  • 58