4

I would like to simulate the distribution for a fixed number of balls m in a fixed number of bins n in R. Up till now I have been using the Poisson approximation with rpois(). This is a decent approximation for a large number of balls in n bins.

However, rpois() only allows you to indicate a rate lambda, which is m/n. As a consequence, the number of positive bins is often smaller than the number of balls.

Would anybody know of a function or script that allows me to randomly distribute balls into bins?

Ultimately I seek to calculate the confidence intervals of -log(empty bins/total bins) by bootstrapping. This problem is 'breaking my balls' so to speak.

Svencken
  • 479
  • 6
  • 14

1 Answers1

2

I think you want the multinomial distribution.

Here's a quick function - we take m balls in n bins, and give x results, returning a vector of your metric for each of the x trials:

myfunc <- function(m,n,x){
  out <- rmultinom(x,m,rep(1,n))
  -log(colSums(out == 0)/n)
}

myfunc(10,40,10)
[1] 0.1923719 0.2548922 0.2231436 0.2548922 0.2876821 0.2876821 0.2231436 0.2231436 0.2231436 0.2548922

You can then get the quantiles/Confidence intervals:

out = myfunc(10,40,1000)
quantile(out, c(0.05,0.95))
       5%       95% 
0.1923719 0.2876821 
jeremycg
  • 24,657
  • 5
  • 63
  • 74
  • 1
    Folks, thanks a lot for your help, this has both answered my question and massively sped up my bootstrapping. Still fairly new to R myself, but getting more familiar one step at a time. I'll put up some finished code when ready. – Svencken Aug 08 '17 at 07:19