4

Let's say I have a vector:

Q<-rnorm(10,mean=0,sd=20)

From this vector I would like to:

1. create 10 variables (a1...a10) that each have a correlation above .5 (i.e. between .5 and 1) with Q.

the first part can be done with:

t1<-sapply(1:10, function(x) jitter(t, factor=100))

2. each of these variables (a1...a10) should have a pre-specified correlation with each other. For example some should be correlated .8 and some -.2.

Can these two things be done?

I create a correlation matrix:

cor.table <- matrix( sample( c(0.9,-0.9) , 2500 , prob = c( 0.8 , 0.2 ) , repl = TRUE ) , 50 , 50 )
k=1
while (k<=length(cor.table[1,])){
    cor.table[1,k]<-0.55
    k=k+1
    }
k=1
while (k<=length(cor.table[,1])){
    cor.table[k,1]<-0.55
    k=k+1
    }   
    diag(cor.table) <- 1

However, when I apply the excellent solution by @SprengMeister I get the error:

Error in eigen(cor.table)$values > 0 : 
  invalid comparison with complex values

continued here: Eigenvalue decomposition of correlation matrix

Community
  • 1
  • 1
user1984076
  • 777
  • 1
  • 8
  • 16
  • 2
    Crossvalidated answered similar question: [Generate a random variable with a defined correlation to an existing variable](http://stats.stackexchange.com/q/15011/8464) – topchef Sep 16 '13 at 14:22

3 Answers3

6

As a pointer to solution use noise function jitter in R:

set.seed(100)
t = rnorm(10,mean=0,sd=20)
t1 = jitter(t, factor = 100)
cor(t,t1)
[1] 0.8719447
topchef
  • 19,091
  • 9
  • 63
  • 102
  • thanks, this is excellent help. I'll keep modifying the question until I get to the final result. Can you explain what the factor argument stands for? jitter help doesn't explain this. – user1984076 Sep 16 '13 at 11:51
  • from docs: _jitter_ has two options: _factor_ and _amount_. I suppose you got the part explaining _amount_. When _amount_ is _NULL_ (default) then function returns _x + runif(n, -a, a)_ where _a = factor * d/5_ where _d_ is the smallest difference between adjacent unique (apart from fuzz) _x_ values. – topchef Sep 16 '13 at 14:18
2

To generate data with a prescribed correlation (or variance), you can start with random data, and rescale it using the Cholesky decomposition of the desired correlation matrix.

# Sample data
Q <- rnorm(10, mean=0, sd=20)
desired_correlations <- matrix(c(
  1, .5, .6, .5,
  .5, 1, .2, .8,
  .6, .2, 1, .5,
  .5, .8, .5, 1 ), 4, 4 )
stopifnot( eigen( desired_correlations )$values > 0 )

# Random data, with Q in the first column
n <- length(Q)
k <- ncol(desired_correlations)
x <- matrix( rnorm(n*k), nc=k )
x[,1] <- Q

# Rescale, first to make the variance equal to the identity matrix, 
# then to get the desired correlation matrix.
y <- x %*% solve(chol(var(x))) %*% chol(desired_correlations)
var(y)
y[,1] <- Q  # The first column was only rescaled: that does not affect the correlation
cor(y)      # Desired correlation matrix
Vincent Zoonekynd
  • 31,893
  • 5
  • 69
  • 78
  • Thank you for your help. In this case you can specify how exactly the variables should be correlated with Q. that's the first part of my question. Part 2 is how to specify the correlation also between these variables. does this also address the second part? – user1984076 Sep 16 '13 at 13:58
  • Yes, you have to specify the whole correlation matrix: the first row (and first column) contains the correlations of the new variables with `Q`; the other elements are the correlations of the new variables between themselves. – Vincent Zoonekynd Sep 16 '13 at 14:21
1

I answered a very similar question a little while ago

R: Constructing correlated variables

I am not familiar with jitter so maybe my solutions is more verbose but it would allow you determining exactly what the intercorrelations of each of your variables and q is supposed to be.

The F matrix referenced in that answer describes the intercorrelations that you want to impose on your data.

EDIT to answer question in comment:

If i am not mistaken, you are trying to create a multivariate correlated data set. So all the variables in the set are correlated to varying degrees. I assume Q is your criterion or DV, and a1-a10 are predictors or IVs.

In the F matrix you would reflect the relationships between these variables. For example

  cor_Matrix <- matrix(c(1.00, 0.90, 0.20 ,
                         0.90, 1.00, 0.40 ,
                         0.20, 0.40, 1.00), 
                         nrow=3,ncol=3,byrow=TRUE)

describes the relationships between three variables. The first one could be Q, the second a1 and the third a2. So in this scenario, q is correlated with a1 (.90) and a2 (.20).

a1 is correlated with a2 (.40)

The rest of the matrix is redundant.

In the remainder of the code, you are simply creating your raw, uncorrelated variables and then impose the loadings that you have previously pulled from the F matrix.

I hope this helps. If there is a package in R that does all that, please let me know. I build this to help me understand how multivariate data sets are actually generated.

To generalize this to 10 variables plus q, just set the parameters that are set to 3 now to 11 and create an 11x11 F matrix.

Community
  • 1
  • 1
SprengMeister
  • 550
  • 1
  • 4
  • 12
  • I am not familiar with rmvnorm either and it is entirely possible that it offers a less complex solution. I have made an edit to answer your second question. – SprengMeister Sep 16 '13 at 14:02
  • yes this is perfect and exactly what I was looking for.only remaining question is how to specify the correlation matrix for a larger set of variables? for 3 variables you can hand code it but what if there are 50? – user1984076 Sep 16 '13 at 14:22
  • just expand the template. it is currently a 3x3, right. So make that a 50x50. Then, change the code to reflect hat. Basically, whenever you see a "3", make it a 50. – SprengMeister Sep 16 '13 at 14:25
  • you could generate such matrix programmatically. If it is structured the way you describe it I would probably just created it in Excel and then load it into your program. – SprengMeister Sep 16 '13 at 14:33
  • please see my final edit. this is the last issue – user1984076 Sep 16 '13 at 14:39
  • I am not sure what you are trying to do with your while loops. You are only changing the first row and first column. I would put in another question with reproducible code. – SprengMeister Sep 16 '13 at 14:47
  • http://stackoverflow.com/questions/18831058/eigenvalue-decomposition-of-correlation-matrix – user1984076 Sep 16 '13 at 14:53