The problem:
I have been sampling 5 categories over 6 months and their effects on certain enviromental activity over the months their proportion has varied like this:
| Month| A| B| C| D| E|
|-----:|---------:|---------:|---------:|---------:|---------:|
| 1| 0.6666667| 0.3012821| 0.0320513| 0.0000000| 0.0000000|
| 2| 0.5603448| 0.1494253| 0.1408046| 0.1235632| 0.0258621|
| 3| 0.1962843| 0.0961228| 0.3400646| 0.2285945| 0.1389338|
| 4| 0.1135647| 0.0368034| 0.4090431| 0.2954784| 0.1451104|
| 5| 0.0799087| 0.0182648| 0.3812785| 0.3835616| 0.1369863|
| 6| 0.0854701| 0.0085470| 0.3760684| 0.4316239| 0.0982906|
Where, as you can see the A and B have gone down and C, D and E have gone up with this correlation:
| A| B| C| D| E|
|----------:|----------:|----------:|----------:|----------:|
| 1.0000000| 0.9402901| -0.9885358| -0.9437185| -0.9358701|
| 0.9402901| 1.0000000| -0.9511070| -0.9612210| -0.8413999|
| -0.9885358| -0.9511070| 1.0000000| 0.9139291| 0.9559101|
| -0.9437185| -0.9612210| 0.9139291| 1.0000000| 0.7789632|
| -0.9358701| -0.8413999| 0.9559101| 0.7789632| 1.0000000|
dataset is given by:
Cor_By_Month <- structure(c(1, 0.940290075149674, -0.988535776442558, -0.943718544223924,
-0.935870083299231, 0.940290075149674, 1, -0.951106988627249,
-0.961220998780756, -0.841399937722727, -0.988535776442558, -0.951106988627249,
1, 0.913929137201831, 0.955910074676834, -0.943718544223924,
-0.961220998780756, 0.913929137201831, 1, 0.778963196453952,
-0.935870083299231, -0.841399937722727, 0.955910074676834, 0.778963196453952,
1), .Dim = c(5L, 5L), .Dimnames = list(NULL, c("A", "B", "C", "D", "E")))
I want to graph the response curves of my models, but instead of varying A from 0, to 1 and then keeping the other classes to the mean, I want all proportions to add 1 and to have the proper correlation values.
Expected solution
a Data Frame with at least 100 samples, where all the Categories (A to E) vary from 0 to 1, with several intermediate values with every row adding 1 and the correlation between variables to stay at the correlation given by the Cor_By_Month dataset:
What I have tried:
Using mvnorm from MASS
I know this is not the best way of dealing with this, since this is not necessarily normal data, but it is so far the only way I have found to do this:
So: knowing that the mean values of my 5 classes is:
Means <- c(0.283706542309262, 0.101740888487065, 0.279885087917025, 0.243803624143928,
0.0908638571427198)
And that the correlation is given by Cor_By_Month
I tried:
out <- as.data.frame(mvrnorm(1000, mu = Means,
Sigma = Cor_By_Month,
empirical = T))
but of course the values go all over the place and don't conform to my 0, to 1 values despite having the needed correlation values, in order to try to correct that I scaled it by the min and max value of each column:
mins <- apply(out, 2, min)
maxs <- apply(out, 2, max)
out <- scale(out, center = mins, scale = maxs - mins)
So now I fixed one of my 2 problems, all the Values of A to E are between 0 and 1, but all the rows of my data frame sum values way over one.
To fix this I tried the following:
out <- as.data.frame(mvrnorm(1000, mu = runif(n = 5),
Sigma = Cor_By_Month_Polity,
empirical = F))
mins <- apply(out, 2, min)
maxs <- apply(out, 2, max)
out <- scale(out, center = mins, scale = maxs - mins) %>%
as.data.frame() %>%
rowwise() %>%
mutate(Total = sum(c_across(V1:V5))) %>%
mutate_at(vars(V1:V5), ~./Total) %>%
rowwise() %>%
mutate(Total = sum(c_across(V1:V5))) %>%
as.data.frame()
Now everything adds to 1 row-wise, but it is not common for any proportion to have a value of over 0.5, and I have tried doing 300000 with no value over 0.54.
I am sure there are better solutions to what I am trying to do