I have a simple model for a company with two farms, growing two crops (apples and pears) on each farm. The first step is just to multiply the number of trees by the amount of fruit on each tree.
The amount of fruit on each tree is simulated (across farms and crops).
I face at least three decisions when modeling this in R:
- How to structure the variables
- How to simulate
- How to multiply a simulated variable with a non-simulated variable
I want it to work even if I add another crop and/or farm - and ideally even if I add another dimension, e.g. crop variety (Granny Smith etc). I also want to refer to farms and crops by name, not by an index number.
Here is the approach I have come up with. It works, but it is hard to add another dimension, and it is a lot of code. Is there a neater way?
To structure the variables:
farms <- c('Farm 1', 'Farm 2');
crops <- c('Pear', 'Apple');
params <- c('mean','sd');
numTrees <- array(0, dim=c(length(farms), length(crops)), dimnames=list(farms,crops));
fruitPerTree <- array(0, dim=c(length(farms), length(varieties), length(params)),
dimnames=list(farms,varieties,params));
# input data e.g.
numTrees['Farm 1', 'Pear'] = 100
# and
fruitPerTree['Farm 1', 'Pear', 'mean'] = 50
To simulate:
simNormal2D <- function(dataVar, numSims) {
#
# Generate possible outcomes for dataVar (e.g. fruitPerTree).
# It generates them for each value of the first two dimensions.
#
# Args:
# dataVar: 3-dimensional array,
# with 3rd dim being the normal params
# numSims: integer, e.g. 10000
#
# e.g. sims <- simNormal2D(fruitPerTree, 10000)
#
# Returns:
# a 3D array with 3rd dimension indexing the simulated results
#
dims <- dimnames(dataVar);
sims <- array(dim=c(length(dims[[1]]), length(dims[[2]]), 0),
dimnames=list(dims[[1]], dims[[2]], NULL));
for(x in dims[[1]]) {
for(y in dims[[2]]) {
sim <- rnorm(numSims, dataVar[x, y, 'mean'],
dataVar[x, y, 'sd'] );
sims <- append(sims, sim);
}
}
# R fills array from first arg columnwise, so dims are reversed
sims <- array(sims, c(numSims, length(dims[[2]]),
length(dims[[1]])),
dimnames=list(c(1:numSims), dims[[2]], dims[[1]]));
# reverse them back again
sims <- aperm(sims, c(3,2,1));
return(sims);
}
simFruitPerTree <- simNormal2D(fruitPerTree, numSims);
To multiply simFruitPerTree
and numTrees
, I find I first have to perform a manual broadcast:
simNumTrees <- array(numTrees, dim=c(length(dims[[1]]), length(dims[[2]]), numSims),
dimnames=list(dims[[1]], dims[[2]], c(1:numSims)));
simTotalFruit <- simFruitPerTree * simNumTrees;
For comparison, in Python (which I know better than R), I can perform all these steps in a few lines by using tuples to index a dictionary, along with two dictionary comprehensions, e.g.:
fruit_per_tree = {}
fruit_per_tree[('Farm 1', 'Pear')] = (50, 15) # normal params
sim_fruit_per_tree = {key: random.normal(*params, size=num_sims)
for key, params in fruit_per_tree.items() }
sim_total_fruit = {key: sim_fruit_per_tree[key]*num_trees[key] for key in num_trees }
Is there an easy way in R too? Thanks!