0

I am trying to generate data from a multinomial distribution in R using the function rmultinom, but I am having some problems. The fact is that I want a data frame of 50 rows and 20 columns and a total sum of the outcomes equal to 3 times n*p.

I am using this code:

p <- 20
n <- 50
N <- 3*(n*p)
prob_true <- rep(1/p, p)
a <- rmultinom(50, N, prob_true)

But I get some very strange results and a data frame with 20 rows and 50 columns. How can I solve this problem?

Thanks in advance!

Bibi
  • 87
  • 9

1 Answers1

1

The help available at ?rmultinom says that n in rmultinom(n, size, prob) is:

"number of random vectors to draw"

And size is: "specifying the total number of objects that are put into K boxes in the typical multinomial experiment"

And the help says that the output is: "For rmultinom(), an integer K x n matrix where each column is a random vector generated according to the desired multinomial law, and hence summing to size"

So you're asking for 50 vectors/variables with a total number of "objects" equal to 3000, so each column is drawn as a vector that sums to 3000.

colSums(a) does result in 3000.

Do you want your vectors/variables as rows? Then this would work just by transposing a:

t(a)

but if you want 20 columns, each that is its own variable, you would need to switch your n and p (I also subbed in n in the rmultinom call):

n <- 20
p <- 50
N <- 3*(n*p)
prob_true <- rep(1/p, p)
a <- rmultinom(n, N, prob_true)
Peter O.
  • 32,158
  • 14
  • 82
  • 96
brian avery
  • 403
  • 2
  • 8