2

I am having troubles plotting a Cumulative Distribution Function.

So far I Have found this:

scipy.stats.beta.cdf(0.2,6,7)

But that only gives me a point.

This will be what I use to plot:

pylab.plot()
pylab.show()

What I want it to look like is this: File:Binomial distribution cdf.svg

with p = .2 and the bounds stopping once y = 1 or close to 1.

Wh1T3h4Ck5
  • 8,399
  • 9
  • 59
  • 79
Overtim3
  • 85
  • 1
  • 2
  • 13

2 Answers2

7

The first argument to cdf can be an array of values, rather than a single value. It will then return an array of values.

import scipy.stats as stats
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0,20,100)
cdf = stats.binom.cdf
plt.plot(x,cdf(x, 50, 0.2))
plt.show()

enter image description here

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 1
    This is exactly what I'm looking for! Thank you soo much! I spent about 3 hours trying to work up to something like this. But I'm still having a hard time understanding this conceptually. What is this linspace(0,20,100) As for the stat.beta.cdf was I just using the wrong code? I see you used stats.binom.cdf – Overtim3 Oct 12 '12 at 01:43
  • [np.linspace](http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html)(0,20,100) creates a numpy array of 100 evenly spaced values between 0 and 20 (inclusive). (Try `print(np.linspace(0,20,10))`. It pays to experiment!) – unutbu Oct 12 '12 at 02:00
  • Since you posted a link to a graph of a binomial cdf, I used [stats.binom.cdf](http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom.html) instead of [stats.beta.cdf](http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.beta.html). – unutbu Oct 12 '12 at 02:03
  • i still am confused about how the stats.binom.cdf(x,50,0.2) works... I understand the x can be an array and i understand my p = 0.2 but why 50? I have played with the numbers and have went to the manual but it still don't make sense... – Overtim3 Oct 12 '12 at 15:21
  • On the [doc page](http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom.html) you'll see an example using `binom.cdf(x, n, p)`. The `x` can be any array-like object. `n` is the number of tosses, and `p` is the probability of success. The meanings of the parameters can be understood by looking at [the source code](https://github.com/scipy/scipy/blob/v0.11.0/scipy/stats/distributions.py) and matching it up with this [wikipedia page](http://en.wikipedia.org/wiki/Binomial_distribution). – unutbu Oct 12 '12 at 16:50
5

I don't think the user above, ubuntu, has suggested the right function to use. Actually his answer is very much misleading and incorrect at large.

Note that binom.cdf() is a function to calculate the cdf of a binomial distribution specified by n and p, Binomial(n,p). That's to say it returns values of the cdf of that random variable for each value in x, rather than the actual cdf function for the discrete distribution specified by vector x.

To calculate cdf for any distribution defined by vector x, just use the histogram() function:

import numpy as np
hist, bin_edges = np.histogram(np.random.randint(0,10,100), normed=True)
cdf = cumsum(hist)

or, just use the hist() plotting function from matplotlib.

songyuanyao
  • 169,198
  • 16
  • 310
  • 405
user3567032
  • 51
  • 1
  • 1