-4

Let us assume the following vector:

x = c( 0.5, 0.4, 0.8 )

where the x[1] and x[2] values are correlated, with the correlation matrix:

     x[1]  x[2]  x[3]
x[1]  1    0.8    0
x[2]  0.8  1      0
x[3]  0    0      1

I want to compute the average of x, but taking in account the correlations.

I tried with a generalised least squares with lm(), but that implies to use an horizontal function, and lm() does not like it with poly(x,0). I looked for using a user-defined function, but it should return the parameter to be fitted…

As a concrete example, let us take three species from an evolution tree:

library(ape)
## The evolution tree
t=rtree(3)
## Plot it, you notice that two are closer to each other than the 3rd one
plot(t)
## Correlation matrix
vcv.phylo(t,corr=T)
      t1        t3 t2
t1 1.0000000 0.4019544  0
t3 0.4019544 1.0000000  0
t2 0.0000000 0.0000000  1

Any tip welcome!

Xavier Prudent
  • 1,570
  • 3
  • 25
  • 54
  • im a little confused; are you suggesting the individual elements of the vector `x` are correlated ie. `x[1] = 0.5 & x[2] = 0.4` are correlated) - doesn't seem right (correlation will not be defined) – user2957945 Dec 20 '14 at 12:28
  • Yes, for instance imagine that I want to average the body size of people in a street. How should I deal with brothers? Their size is highly correlated due to their common parents. – Xavier Prudent Dec 20 '14 at 12:49
  • your question is not clear at all: it seems like you do not know yourself which quantity you want to compute. What do you define exactly by "I want to compute the average of x, but taking in account the correlations" ? weighted average? – Colonel Beauvel Dec 20 '14 at 13:33
  • Yes, a weighted average. So in the case of no correlation (0.5+0.4+0.8)/3. But given the correlation between the 1rd and 2nd measurements, they should not get a weight of 1. So something like (0.5*0.5 + 0.4*0.5 + 0.8*1)/2 – Xavier Prudent Dec 20 '14 at 14:40
  • Can you provide an example of calculating the correlation of individual values? Specifically, how do you take a vector of three values and return a 3x3 correlation matrix? `cor(0.5, 0.4)` makes no sense mathematically (and returns `NA` in R). Once we're clear on this part, determining your weights should be relatively straight-forward. – r2evans Dec 20 '14 at 20:13
  • Here is a concrete example where the correlation arises from a phylogeny tree: – Xavier Prudent Dec 21 '14 at 21:38
  • I updated my question and added an example of how to get the correlation from. – Xavier Prudent Dec 21 '14 at 21:42
  • Looking at it now, I realise my question was cumbersome. Let us rephrase it: consider a family of 2 parents with 2 children, the 2 children being twins. You want to compute the average size in that family, but given that the children are twins, their size is highly correlated. Should that effect be included in the way the mean is computed? – Xavier Prudent Jan 06 '15 at 16:13
  • 1
    The phenomenon you are describing is properly called autocorrelation. You may have better luck searching for similar questions on stats.stackexchange.com. The issue may not lie in computing the mean, but in sampling from a variable with suspected autocorrelation. – vpipkt Jan 06 '15 at 16:44

1 Answers1

0

The answer can be found in that CERN paper: preprint (HTTPS), preprint (FTP) or the published copy.

The procedure is a generalised least square regression.

See the equation (2) page (1) for the result.

Xavier Prudent
  • 1,570
  • 3
  • 25
  • 54