0

I've got a problem of allocation of individuals by zones. To simplify this, let's say we have 5 sets, each one with a given (and decreasing) population.

S1=1000
S2=100
S3=50
S4=25
S5=5

I'd like to allocate these populations to zones (say, 10 zones Z1-Z10). Each zone has a given probability of hosting the individuals from the sets.

In a matrix form:

N<-matrix(prob,nrow=5,ncol=10)

N probabilities per set being something like:

0.15  0.05  0.1  0.05 0.05  0.2  0.01 0.09  0.15  0.15
...........
0.15  0.05  0.1  0.05 0.05  0.2  0.01 0.09  0.15  0.15

I'd like to know the resulting population, for each set, for each zone. In the first row there's no problem, because the population is high. Just multiplying 1000 by the probabilities:

S1 allocation is OK:

150  50  100  50 50 200 10  90  150  150  (individuals)

but, when you get to set S3, popul. 50, rounding is not easy as some zones would get less than 1 individual:

S3 allocation:

7.5   2.5  5  2.5  2.5  10  0.5 4.5 7.5  7.5 (individuals)

You may even get to potential sets with only 1 individual to be allocated to one of 10 zones.

How can I use the sample function in R (sample(data,size,prob)), or a similar one, to produce the allocation matrix that calculates an integer number of individuals per zone?

NB: obviously in the real problem the no. of zones is much higher, and probabilities vary for each zone.

Thanks in advance, dev

user3310782
  • 811
  • 2
  • 10
  • 18

1 Answers1

1

You can use rmultinom to sample. For example rmultinom(1, S1, prob). And to get the desired matrix, just use it inside sapply.

t(sapply(c(S1, S2, S3, S4, S5), 
         rmultinom, 
         n=1, 
         prob=prob))

EDIT: It is not entirely clear what your criteria for rounding is. Here are two possibilities:

fct <- function(s, prob, method=c("multinom", "maxima")){
  method <- match.arg(method)
  sp <- s*prob
  res <- floor(sp)
  if (sum(res) < s) {
    size <- s-sum(res)
    prob <- sp-res
    if (method=="multinom")
      # rmultinom version
      res <- res + rmultinom(n=1, size=size, prob=prob) 
    if (method=="maxima"){
      # maximum version
      rnk <- rank(prob, ties="random")
      res[rnk <= size] <- res[rnk <= size] + 1
    }
  }
  return(res)
}
t(sapply(c(S1, S2, S3, S4, S5), fct, prob=prob, method="mult"))
t(sapply(c(S1, S2, S3, S4, S5), fct, prob=prob, method="max"))
shadow
  • 21,823
  • 4
  • 63
  • 77
  • Thanks, that's very ingenious. I had never heard of the multinomial function!! – user3310782 Sep 11 '14 at 10:24
  • Can I just add one little 'but'? Say you need consistency in the allocation: a zone with prob 0.15 would get 150 out of 1000 individuals, that's perfect. The problem comes because, when using rmultinom, you can get 150, or 127!! Ideally what I'm trying is to allocate the right # of individuals doing a rounding that, when a zone has a probability <1 (0.37) doesn't produce 0 0 0 .... for every zone, so in the end does not allocate the total population. Any ideas? – user3310782 Sep 11 '14 at 10:30