1

I am trying to simulate a vector that is correlated to a few other vectors. I figured out the code for simulating a vector correlated to one other vector, but can't figure out how to simulate it with correlations to multiple other vectors:

Here is my code:

library(faux)
p4<-rnorm_pre(data$p1, mu = 0, sd = 10, r = 0.4, empirical = FALSE)

What I would like to do is somehow identify multiple vectors for the simulated trait to be correlated to. Im not sure if this library is the best to use

My data look like

 ID  p1  p2  p3 
 1 0.25 0.30 0.02
 2 0.05 0.67 0.18
 3 0.09 0.31 0.38
 4 0.55 0.87 0.21
 5 0.25 0.64 0.01

And I would like to add another column called p4 that is the vector of simulated data, which is correlated to p1 and p3.

Any suggestions are much appreciated.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
Cae.rich
  • 171
  • 7
  • By *"correlations to multiple other parameters"* do you mean *"correlations to multiple other **vectors**"*? – Rui Barradas Apr 28 '21 at 14:15
  • Can you post sample data? Please edit **the question** with the output of `dput(data$p1)`. Or, if it is too big with the output of `dput(head(data$p1, 20))`. – Rui Barradas Apr 28 '21 at 14:16
  • @RuiBarradas, I have updated with sample data (the full dataset if very large). yes, I did mean a vector. – Cae.rich Apr 28 '21 at 14:46

1 Answers1

1

The new vector can be created just like the vignette says.

library(faux)

data$p4 <- rnorm_pre(
  data[-1],             # remove 1st column ID
  mu = 0, 
  sd = 4, 
  r = c(-0.2, 0.2, 0.1)
)

cor(data[-1])
#           p1         p2          p3          p4
#p1  1.0000000  0.5695821 -0.20120754 -0.21833687
#p2  0.5695821  1.0000000 -0.08533300  0.60506386
#p3 -0.2012075 -0.0853330  1.00000000  0.06803646
#p4 -0.2183369  0.6050639  0.06803646  1.00000000

Here is a way to specify correlations with only columns p1 and p3.

data$p5 <- rnorm_pre(
  data[c("p1", "p3")],  # only columns p1 and p3
  mu = 0,
  sd = 1,
  r = c(0.5, -0.2)
)

cor(data[c("p1", "p3", "p5")])
#           p1         p3         p5
#p1  1.0000000 -0.2012075  0.5772403
#p3 -0.2012075  1.0000000 -0.0806465
#p5  0.5772403 -0.0806465  1.0000000

Data in dput format

data <-
structure(list(ID = 1:5, p1 = c(0.25, 0.05, 0.09, 0.55, 0.25), 
    p2 = c(0.3, 0.67, 0.31, 0.87, 0.64), p3 = c(0.02, 0.18, 0.38, 
    0.21, 0.01)), class = "data.frame", row.names = c(NA, -5L))
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thanks for your response! I am only looking to simulate 1 vector (p4). The other vectors are already there. When you created the data correlation matrix, it also created correlations between all the other p# vectors. for example, I want p4 to have a correlations of -0.3 with p1 and 0.2 with p3. I don't think I understand the process of the code. @Rui Barradas – Cae.rich Apr 28 '21 at 20:11
  • @Cae.rich Function `rnorm_pre` creates the new vector with the mean, sd and correlations given. These are population statistics, since `empirical = FALSE` by default. The data correlation matrix is only meant to see what are the actual correlation values of the vector `p4` that the function managed to generate with the other vectors. – Rui Barradas Apr 28 '21 at 21:20
  • @Cae.rich I asked for a `cor(p1, p4) == -0.2` and got `-0.21833687`, for instance. `cor(p2, p4)` is even very far from the wanted value passed as an argument to the function. – Rui Barradas Apr 28 '21 at 21:22
  • Got it! That makes sense. So the r = c(-0.2, 0.2, 0.1) states the desired correlation between p4 and p1, p2, and p3, respectively? is there a way to specific the correlation for only some of the vectors, not all of them. For example, p1 and p3, but have no correlation specific for p2? Thanks for all of your help – Cae.rich Apr 29 '21 at 20:05
  • @Cae.rich You can specify the correlations for some of the vectors if you subset the data.frame, will edit with an example. – Rui Barradas Apr 29 '21 at 21:46
  • When I use the code, it keeps bringing up this code, " x must be a vector". Any thoughts? – Cae.rich May 02 '21 at 21:51
  • @Cae.rich With the data as posted I cannot reproduce the error. Are you running the code on data with the same structure? You can try `as.matrix(data[-1])` or `as.matrix(data[c("p1", "p3")])` and see if it solves the problem. – Rui Barradas May 03 '21 at 09:08