0

There's lots of advice for how to calculate quantiles for a column of data, but I'm trying to label each data point with which quantile it belongs to based on some other field.

A super simple example:

pink<-data.frame(matrix(rnorm(20),nrow=100))
pink$color<-c("pink")
red<-data.frame(matrix(rnorm(50),nrow=100))
red$color<-c("red")
names(red)[names(red)=="matrix.rnorm.50...nrow...100."]<-"value"
names(pink)[names(pink)=="matrix.rnorm.20...nrow...100."]<-"value"
mydata<-rbind(red,pink)

So imagine all I have is the mydata data frame. I want a new column that assigns each row to a quantile based on the $value for the $color. In the case above, a row that has a value of 0.7 is going to be in the top quartile for pink but it wouldn't be for red.

How do I do this kind of "quantile by group/factor"? Thanks for any help!

  • Despite the fact that you got three answers, I find this question incoherent. Perhaps you want to determine "quartiles"? If so you should edit the question to make that clear. "Quantiles" and "quartiles" are _not_ the same words. – IRTFM Oct 03 '14 at 04:50

3 Answers3

2

You can use cut to get the quantile values, and also label them

newData <- lapply(split(mydata, mydata$color), function(x) {
    x$key <- with(x, cut(value, quantile(value), 
                  labels = 1:4, include.lowest = TRUE))
    x
})
lapply(newData, head)
# $pink
#           value color key
# 101 -1.10353351  pink   1
# 102  0.90278706  pink   4
# 103  0.06831835  pink   3
# 104 -1.14946795  pink   1
# 105 -0.90127498  pink   2
# 106 -1.19845352  pink   1
# 
# $red
#         value color key
# 1 -0.04827783   red   3
# 2  0.28444148   red   3
# 3 -2.34491308   red   1
# 4  0.35213987   red   4
# 5 -0.73670929   red   2
# 6  0.15286555   red   3

Then do.call(rbind, newData) to bring the two back together.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
1

A solution with the data.table package:

    require(data.table)
    dt<-as.data.table(mydata)
    dt[,list(value,color,findInterval(value,quantile(value,c(.25,.50,.75)))),by=color]
nicola
  • 24,005
  • 3
  • 35
  • 56
  • could simplify the code slightly: dt[,list(value,findInterval(value,quantile(value,c(.25,.50,.75)))),by=color] – KFB Oct 03 '14 at 08:37
1

Try ave function:

> mydata$block= with(mydata, ave(value, color, 
           FUN=function(x)findInterval(x,quantile(x,c(.25,.50,.75))) ))
> head(mydata)
       value color block
1  1.6147642   red     3
2 -0.2315269   red     2
3 -0.6822261   red     1
4 -0.6143924   red     1
5  1.5840925   red     3
6 -0.8477495   red     0
rnso
  • 23,686
  • 25
  • 112
  • 234