2

I have a little exercise to solve with Rstudio for my statistics exam. I tryed to translate it in english, so if something isn't clear please ask me for explanations.

"Simulate 100,000 births and use the following probabilities: males 51.3%, females 48.7%, using the sample function.

  • Check how much the number of males and females obtained differ from the theoretical percentages.

  • Draw the PMF and the CDF of the probability function of this experiment (on a sample of 50 births).

  • Calculate mean and variance of the distribution."

I obtained 51356 males and 48644 females, a difference of 56.

But now, How can I draw PMF and CDF of the probability function?

Here I put the code used to simulate the births:

mysample <- data.frame(sample(c("M","F"),100000,replace=T,prob=c(0.513,0.487)))
names(mysample)<-c("Gender")
males <- subset(mysample, Gender=="M")
females <- subset(mysample,Gender=="F")

theoricM <- 100000*0.513
theoricF <- 100000*0.487
realM <- as.integer(nrow(maschi))
realF <- as.integer(nrow(femmine))

#create a data frame to show differences
result <-data.frame(realM,theoricM,realF,theoricF)
names(result)<- c("Males","Theoric Males","Females","Theoric Females")

And results:

enter image description here

Hope someone could help me, I know it's a very easy question for someone experienced with R, but I'm at the very beginning with this language.

So thank you to everyone who will reply.

EDIT:

I tried this code:

x <- 1:50
plot(x,dbinom(x ,size = 50,prob = 0.513),type="l", ylab="PMF", main="Binomial Distribution PMF")

And the result is:

enter image description here

What I think I understand is that, being the prob very close to 1/2, on a set of 50 births the number of males will be very close to 25. Is what plot is showing? And, is this the correct way to do that?

Jongware
  • 22,200
  • 8
  • 54
  • 100
Gio Bact
  • 541
  • 1
  • 7
  • 23
  • hints: `?table`, `?cumsum` – Ben Bolker Jan 07 '15 at 15:02
  • Sorry I can't understand, could you please explain better? – Gio Bact Jan 07 '15 at 15:12
  • 1
    I'm unwilling to give you too much help on an exam. Actually, in re-reading the question I think you should look at `?dbinom` (you're being asked to plot the *theoretical* sampling distribution, not the observed distribution from a large number of simulations). – Ben Bolker Jan 07 '15 at 15:17
  • This is not the exam (of course!), it is an exercise in preparation for the exam; so I have to learn how to solve it to pass the exam. However, I had already intuited it was binomial distribution, I can not understand how to create the theoretical pmf; for example in this code: how can I set that the possible choices are only two (M and F) `x <- 1:50 plot(x,dbinom(x ,size = 50,prob = c(0.513,0.487)),type="l", ylab="PMF", main="Binomial Distribution PMF")` I tried something like this `data <- sample(c("M","F"),50,replace=T,prob=c(0.513,0.487))` instead of x, but it's non numeric – Gio Bact Jan 07 '15 at 15:43
  • you should add this stuff to your question, so that it's clearer what you tried. (You're close.) Hint #2: for a binomial, you don't need to specify both probabilities -- just the probability of whichever outcome you're counting as "success". (The problem would be a lot harder for a multinomial distribution with >2 categories ...) – Ben Bolker Jan 07 '15 at 15:48
  • Ok, I followed your hints and I edited my question. Check it now. Thank you for your patience – Gio Bact Jan 07 '15 at 16:02

1 Answers1

4

Your code (and conclusion) look correct to me.

It might be graphically better to use type="h" to draw a "high-density" plot; this makes it clearer that there is zero probability for non-integer values of x.

x <- 1:50
par(las=1,bty="l") ## cosmetic
plot(x,dbinom(x ,size = 50,prob = 0.513),type="h", ylab="PMF", 
     main="Binomial Distribution PMF")

enter image description here

(When you plot the CDF/CMF, you may want to use type="s" or type="S"; see ?plot)

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453