1

I wish to simulate a set of categorical variables which correlates with a simulated numerical variable. More specifically, I have variable the age which is defined like: age <- rnorm(n=1000, mean=35, sd =9) and I wish to simulate another variables class in which higher age makes for higher class. Can anyone point me in the right direction? Thanks in advance!

Quantizer
  • 275
  • 3
  • 13
  • may be of help https://stackoverflow.com/questions/66306304/simulation-of-correlated-categorical-and-continuous-data – user63230 Apr 26 '22 at 11:40

1 Answers1

0

What I understand is that if a correlates with b, it means a and b are linearly related. So, a can be represented by a linear function of b. To generate random variables, a random noise should be added.

Here is one way of doing that:

set.seed(1)
age <- rnorm(n=10, mean=35, sd =9)
beta <- runif(1, min = 1, max = 5) # or any other finite min and max values, can be positive or negative, but in your case should be positive.
class <- beta*age + rnorm(length(age), mean = 0, sd = 2) # or any other mean and sd values

# Check correlation between age and class
cor(age, class)
#[1] 0.9994416

# Check if higher age makes for higher class
data.frame(sort(age), sort(class))

   sort.age. sort.class.
1   27.47934    129.6408
2   27.61578    131.3707
3   29.36192    137.5428
4   32.25150    152.3856
5   36.65279    171.3957
6   37.96557    179.0890
7   39.38686    184.8634
8   40.18203    187.9404
9   41.64492    198.2192
10  49.35753    233.2981
Abdur Rohman
  • 2,691
  • 2
  • 7
  • 12