3

I'm trying using the corr() function to calculate weighted ponderations. The way it works is the first argument should be a matrix with two columns corresponding to the two variables whose correlation we wish to calculate and the second a vector of weights to be applied to each pair of observations.

Here is an example.

> head(d)
 Shade_tolerance htot
1            4.56 25.0
2            2.73 23.5
3            2.73 21.5
4            3.97 17.0
5            4.00 25.5
6            4.00 23.5

> head(poids)
[1] 5.200440e-07 5.200440e-07 1.445016e-06 1.445016e-06 1.445016e-06 1.445016e-06

> corr(d,poids)
[1] 0.1357279

So I got it and I'm able to use it on my matrix but I would like to compute different correlations according to the levels of a factor. Let's say as if I was using the tapply() function.

> head(d2)
  Shade_tolerance htot idp
1            4.56 25.0  19
2            2.73 23.5  19
3            2.73 21.5  19
4            3.97 17.0  18
5            4.00 25.5  18
6            4.00 23.5  18

So my dream would be to do something like this:

tapply(as.matrix(d2[,c(1,2)]), d2$idp, corr)

Except that as you know in tapply() the first element needs to be avector not a matrix.

Would someone have any solution for me?

Thanks a lot for your help.

EDIT: I just realized that I am missing the weights for the weighted correlation in the part of the data frame I showed you. So it would have some how to take both the matrix and the weights according to the levels of the factor.

> head(df)
  Shade_tolerance htot idp        poids
1            4.56 25.0  19 5.200440e-07
2            2.73 23.5  19 5.200440e-07
3            2.73 21.5  19 1.445016e-06
4            3.97 17.0  19 1.445016e-06
5            4.00 25.5  19 1.445016e-06
6            4.00 23.5  19 1.445016e-06

I hope it is clear.

Tom
  • 61
  • 6

3 Answers3

2

If you've a "huge" data.frame, then using data.table might help:

require(data.table)
dt <- as.data.table(df)
setkey(dt, "idp")
dt[, list(corr = corr(cbind(Shade_tolerance, htot), poids)), by=idp]

#    idp      corr
# 1:  18 0.9743547
# 2:  19 0.8387363
Arun
  • 116,683
  • 26
  • 284
  • 387
  • I just edited my question. I was missing one element to compute my weighted correlations. df$poids which are the weights have to be taken into account somewhere. – Tom Mar 13 '13 at 09:10
  • @Tom, if you've huge data, try out the `data.table` solution in the edit. – Arun Mar 13 '13 at 09:26
  • it would be faster than the ddply solution? I'll try it then. Thanks. – Tom Mar 13 '13 at 09:27
  • @Tom, yes indeed. If you've too many values for `idp`, you'll see the difference. With `data.table`, your bottleneck is almost just the "corr" function. – Arun Mar 13 '13 at 09:29
  • I have more than 44 000 values of idp. I'll run your solution and let you know. – Tom Mar 13 '13 at 09:30
  • Perfect. It worked and it was very fast. Thanks a lot. And no issue with the NAs, it just gave me NA for these values of idp. – Tom Mar 13 '13 at 09:34
1

Here is a solution using function ddply() from library plyr.

ddply(df,.(idp),
   summarise,kor=corr(cbind(Shade_tolerance, htot),poids))
  idp       kor
1  18 0.9743547
2  19 0.8387363
Didzis Elferts
  • 95,661
  • 14
  • 264
  • 201
  • I took the ddply solutions just because I was already using it on the huge df I have, but could not find how to figure this out. It is running (huge df), I'll let you know in a few minutes if it worked. The only thing I am a bit worried about is that I have NA in my df for some values of both Shade tolerance and htot and I don't think corr() can manage with NA. – Tom Mar 13 '13 at 09:22
  • @Tom I updated my solution - replaced as.matrix() with cbind() because as.matrix() gave strange results. – Didzis Elferts Mar 13 '13 at 09:24
  • Ok. What do you think about the NAs issue? – Tom Mar 13 '13 at 09:26
0

Using by and cbind,

 library(boot)
 by(dat,dat$idp,FUN=function(x)corr(cbind(x$Shade_tolerance,x$htot),x$poids))
dat$idp: 18
[1] 0.9743547
--------------------------------------------------------------------------------------- 
dat$idp: 19
[1] 0.7474093
agstudy
  • 119,832
  • 17
  • 199
  • 261