5

I have a data.frame and I want to apply quantile on that to make data look simpler:

> head(Quartile)
             GSM1321374 GSM1321375 GSM1321376 GSM1321377 GSM1321378 GSM1321379
1415670_at    11.203302  11.374616  10.876187   11.23639   11.02051  10.926481
1415671_at    11.196427  11.492769  11.493717   11.01683   11.15016  11.576188
1415672_at    11.550974  11.267559  11.800991   11.57551   10.93359  11.222779
1415673_at    11.293390  10.978280  11.367316   10.45135   10.35822  10.234964
1415674_a_at   9.254073  10.572670   9.361991   11.26998   10.21125  10.245857
1415675_at     9.922985   9.228195   9.798156   10.02844   10.19928   9.749947

I applied following function and it did the job.

quantfun <- function(x) as.integer(cut(x, quantile(x, probs=0:4/4), include.lowest=TRUE))
a <- apply(Quartile,1,quantfun)
b <- t(a)
colnames(b) <- colnames(Quartile)

And the output is:

> head(b)
             GSM1321374 GSM1321375 GSM1321376 GSM1321377 GSM1321378 GSM1321379
1415670_at            3          4          1          4          2          1
1415671_at            2          3          4          1          1          4
1415672_at            3          2          4          4          1          1
1415673_at            4          3          4          2          1          1
1415674_a_at          1          4          1          4          2          3
1415675_at            3          1          2          4          4          1

But the problem is it applies quantile on each column separately and I want one uniform quantile for whole data.frame.

> duration = Quartile$GSM1321374
> quantile(duration)
       0%       25%       50%       75%      100% 
 9.254073  9.922985 11.120381 11.203302 11.550974 
> duration = Quartile$GSM1321375
> quantile(duration)
       0%       25%       50%       75%      100% 
 9.228195 10.572670 10.946407 11.267559 11.492769 
user3253470
  • 191
  • 1
  • 4
  • 11
  • 1
    Try `Quartile[] <- matrix(quantfun(unlist(Quartile)), nrow(Quartile))` Instead of the apply function. – Pierre L Oct 14 '15 at 13:53
  • @Pierre Lafortune And how I can check the duration of quantile..? – user3253470 Oct 14 '15 at 13:56
  • Are you looking for `quantile(unlist(Quartile))`? – Pierre L Oct 14 '15 at 13:58
  • I mean something like bins .i.e values from 9.0087 to 9.1078 are in "1" from 9.1079 to 10.0345 are in "2" and so on.. – user3253470 Oct 14 '15 at 13:59
  • That is the way. Each quantile represents one bin. – Pierre L Oct 14 '15 at 14:01
  • This is what I get: > quantile(unlist(Quartile)) 0% 25% 50% 75% 100% 1.00 1.25 2.50 3.75 4.00 ... But I want the ranges of these quantile, like values from 9.0087 to 9.1078 are in "1" from 9.1079 to 10.0345 are in "2" and so on.. – user3253470 Oct 14 '15 at 14:04
  • `Quartile` was written over. Find the ranges before running the function. Or Create a copy `Quartile2 <- Quartile` Then run the function on the copy. `Quartile2[] <- matrix(quantfun...` – Pierre L Oct 14 '15 at 14:06
  • I'm not getting this, can you please add a proper answer to this question. As you understand what I am looking for.. Thanks – user3253470 Oct 14 '15 at 14:08

1 Answers1

3

Find the quartile ranges of your data frame first to get your bins:

quantile(unlist(Quartile))
       0%       25%       50%       75%      100% 
 9.228195 10.229036 10.997555 11.275832 11.800991 

We now have the ranges for each group (i.e 9.228 - 10.229). Then create the quartile data frame:

Quartile[] <- matrix(quantfun(unlist(Quartile)), nrow(Quartile))

We are using the fact that unlist(Quartile) treats the data frame as a vector. If you would like to leave the original data frame intact and use a copy:

Quartile2 <- Quartile
Quartile2[] <- matrix(quantfun(unlist(Quartile2)), nrow(Quartile2))
Pierre L
  • 28,203
  • 6
  • 47
  • 69