0

I am attemtping to write a function that produces m random simulations of 5 Bernoulli Trials. I create a histogram showing the distribution of the number of successes across the m simulations.

I then need to also plot a line showing the theoretical / normalised distribution around the theoretical mean number of successes.

Here is my function as of now:

function x = generate_binomial_bernoulli(n,p,m)
  % generate Bi(n, p) outcomes m times

  emperical = zeros(1,m);             % allocate array for m simulations
  for i = 1:m                         % iterate over m simulations
    successes = 0;                    % count the number of successful trials per simualtion (0-5)
    for j = 1:n                       % iterate through the n trials
      u = rand;                       % generate random nuumber from 0-1
      if (u <= p)                     % if random number is <= p
        successes++;                  % count it as a success
      endif
    end
    emperical(i) = successes;         % store the number of successful trials in this simulation
  end

  close all;                          % close any existing graphs

  x_values = [0:n];                   % array of x-axis values        
  hist(emperical, x_values, "facecolor", "r"); % plot empirical data
  xlim([-0.5 (n + 0.5)]);             % set x-axis to allow for histogram bar widths

  hold on;                            % hold current graph

  mean = n * p;                       % theoretical mean
  norm = normpdf(x_values, mean, 1);  % normalised y values
  plot(x_values, norm, "color", "b"); % plot theoretical distribution

  legend('Emprical', 'Theoretical');   

end

When the function is called as

generate_binomial_bernoulli(5, 0.2, 100)

I expect to see a red histogram showing the results of 100 simulations (emperical results), and then a blue line plot normalised around the mean of 1 success (theoretical result).

Here is the graph produced:

enter image description here

The emperical results are displayed correctly, but the theoretical plot is only extending to the height of a very low value on the y-axis.

Where is my function going wrong?

KOB
  • 4,084
  • 9
  • 44
  • 88

1 Answers1

1

"normpdf" is a probability density function. So it is "normal" that you see numbers below 1. However, histogram returns the frequency of your numbers, it is not a probability. Maybe you want to normalize your frequencies and treat them as probabilities.

freqs=hist(emperical, x_values, 'facecolor', 'r'); % plot empirical data
freqs=freqs/sum(freqs);
figure; bar(x_values,freqs)
Ozcan
  • 726
  • 4
  • 13
  • WIth the line `freqs=hist(emperical, x_values, 'facecolor', 'r');`, I am getting an empty grpah - if you assign the histogram to a variable, does it not actually get displayed? Do I first have to call `hist(emperical, x_values, 'facecolor', 'r');`, followed by `freqs=hist(emperical, x_values, 'facecolor', 'r');`? – KOB Mar 23 '17 at 19:11
  • 1
    You only get the frequencies, not plotting. Try histogram(emperical, x_values, 'Normalization','pdf'); Please see this http://stackoverflow.com/a/34405518/7659682 – Ozcan Mar 23 '17 at 19:15
  • As you stated that `normpdf` returns a probability, I have changed the plotting line to `plot(x_values, theoretical * m, "color", "b");` and it works perfectly, but is this bad practice? – KOB Mar 23 '17 at 19:18
  • 1
    As you multiply it by something, it will exceed 1 and it is not a pdf anymore. I would rather normalize the histogram, your choice. – Ozcan Mar 23 '17 at 19:21
  • Yes, the question you posted seems a lot more suitable, however I am actually using Octave and not MATLAB, and the property 'Normalization' isn't defined for the `hist` function in Octave. I will see if I can find the Octave equivalent. – KOB Mar 23 '17 at 19:23