Matrix of booleans based on quantile in R

Question

I have a matrix whose columns are stock returns and whose rows are dates, which looks like this:

              ES1.Index    VG1.Index   TY1.Comdty    RX1.Comdty   GC1.Comdty
1999-01-05  0.009828476  0.012405717 -0.003058466 -0.0003480884 -0.001723317
1999-01-06  0.021310816  0.027030061  0.001883240  0.0017392317  0.002425398
1999-01-07 -0.001952962 -0.016130850 -0.002826191 -0.0011591516  0.013425435
1999-01-08  0.007989946 -0.004071275 -0.005913678  0.0016224363 -0.001363540

I'd like to have a function that returns a matrix with the same column-names and row-names filled with 1s and 0s based on whether each observation within each row-vector belongs or not to some group within two given quantiles.

For example, I may want to divide each row vector into 3 groups and have 1s for all observations falling within the 2nd group and 0s elsewhere. The result being something looking like:

           ES1.Index VG1.Index TY1.Comdty RX1.Comdty GC1.Comdty
1999-01-05         0         0          1          1          0
1999-01-06         1         0          0          1          0
1999-01-07         0         1          0          0          1
1999-01-08         0         0          1          0          1

(The 1s and 0s in my example are meant to be just a visual outcome, the numbers aren't accurate)

Which would be the least verbose way to get to that?

Division of five numbers into three groups by rank is ambiguous. Even supposing the ambiguity was resolved, the placement of zeros and ones in your example output makes no sense to me. — Frank, Nov 13 '15 at 16:17
I don't get your logic, you wish to get numbers within the second 1/3 of a range between min and max value for each row ? — Tensibai, Nov 13 '15 at 16:22
Try `t(apply(df1, 1, function(x) {x1 <- cut(x, breaks=3); +(levels(x1)[2]==x1)}))` — akrun, Nov 13 '15 at 16:25
@akrun you code sounds the exact description of the OP's question, but give different result... I assume we need clarification from the OP — Tensibai, Nov 13 '15 at 16:32
@Tensibai Yes, that is why I didn't post that as a solution. — akrun, Nov 13 '15 at 16:33
If you do quantile(r[i,], seq(0,1,1/3)) on each row i, you get a vector of quantiles that give you the thresholds you need to know whether your each observation fall within the quantiles you selected. For example, quantile(r[1,], seq(0,1,1/3)) returns a vector -0.003058466 -0.001264908 0.006436288 0.012405717. The result I am looking for in my example is a 1 for those observations that fall within -0.001264908 and 0.006436288 i.e. my 2nd group. The 1s and 0s in my example were meant to be just a visual outcome, I am sorry for not making it clear. — Danny Zuko, Nov 13 '15 at 16:37
@DannyZuko I think you can change the `breaks` in my code to `quantile` you showed. — akrun, Nov 13 '15 at 16:39
It could be a variation of `t(apply(df1, 1, function(x) {x1 <- cut(x, breaks= quantile(x, seq(0, 1,1/3))); +(levels(x1)[2]== x1 & !is.na(x1))}))`, but I am not getting the exact result you showed. Can you check whether that is correct? — akrun, Nov 13 '15 at 16:47
The result I showed was not meant to be accurate, it was just to give a visual idea of the result I was looking for, sorry for not making it clear enough. I don't know why yet, but t(apply(r, 1, function(x) {x1 <- cut(x, breaks=3); +(levels(x1)[2]==x1)})) and t(apply(r, 1, function(x) {x1 <- cut(x, breaks= quantile(x, seq(0, 1,1/3))); +(levels(x1)[2]== x1 & !is.na(x1))})) give different results. I trust more the latter in that the former returns false everywhere in the first two lines, which seems a bit unlikely to be the expected result. — Danny Zuko, Nov 13 '15 at 17:05

score 1 · Accepted Answer · answered Nov 13 '15 at 18:15

Taking the intermediate steps of finding the quantiles and testing against them is not necessary. Only the ordinal properties of each vector matter.

# set bounds
lb = 1/3
ub = 2/3

# find ranks
p = t(apply(m,1,rank))/ncol(m)

# test ranks against bounds
+( p >= lb & p <= ub )


           ES1.Index VG1.Index TY1.Comdty RX1.Comdty GC1.Comdty
1999-01-05         0         0          0          1          1
1999-01-06         0         0          1          0          1
1999-01-07         1         0          1          0          0
1999-01-08         0         1          0          0          1

score 0 · Answer 2 · answered Nov 13 '15 at 17:41

0

We can use apply with MARGIN=1 to loop over the rows, cut each row vector with breaks specified by the quantile, transpose the output to get an output.

t(apply(df1, 1, function(x) {
       x1 <- cut(x, breaks= quantile(x, seq(0, 1,1/3)))
       +(levels(x1)[2]== x1 & !is.na(x1))}))

answered Nov 13 '15 at 17:41

akrun

874,273
37
540
662

I think it would make more sense to `include.lowest=TRUE` with the `cut`; the `NA`s are weird and (as far as I can tell) unnecessary. – Frank Nov 13 '15 at 18:11

Matrix of booleans based on quantile in R

2 Answers2