16

I have been looking for a good tutorial or examples of how to use rv_continuous and I have not been able to find one.

I read:

http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html#scipy.stats.rv_continuous

but it was not really all that helpful (and it lacked any examples of how to use it).

An example of something that I wanted to be able to do is to, specify any probability distributions and being able to call fit and then just simply having the pdf that I wanted and be able to call expect and get the desired expected value.

The thing I understand so far is that to create any probably distribution, we need to create our own class for it and then subclass rv_continuous. Then by specifying a custom _pdf or _cdf we should be able to simply use every method that rv_continuous would provide for us. Like expect and fit should be available now.

However, the thing that is really mysterious for me is, if we don't tell rv_continuous explicitly what the parameters are that specify the probability distribution, is it really able to do all those methods correctly? How does it even do it just with _pdf or _cdf?

Or did I just misunderstand how it works?

Also, if you can provide a simple example of how it works and how to use expect and/or fit, it would be awesome! Or maybe a better tutorial or link it would cool.

Thanks in Advance.

Charlie Parker
  • 5,884
  • 57
  • 198
  • 323

1 Answers1

15

Here's a tutorial: https://docs.scipy.org/doc/scipy/tutorial/stats.html

Basically, rv_continuous is made for subclassing. Use it if you need a distribution which is not defined in scipy.stats (there are more than 70 of them).

Re how it works. In a nutshell, it uses generic code paths: if your subclass defines _pdf and does not define _logpdf, then it inherits

def _logpdf(self, x, *args):
    return log(self._pdf(x, *args))

and a bunch of similar methods (see https://github.com/scipy/scipy/blob/master/scipy/stats/_distn_infrastructure.py for precise details).

Re parameters. You probably mean shape parameters, do you? They are inferred automagically by inspecting the signature of _pdf or _cdf, see https://github.com/scipy/scipy/blob/master/scipy/stats/_distn_infrastructure.py#L617. If you want to bypass the inspection, provide shapes parameter to the constructor of your instance:

class Mydist(stats.rv_continuous):
    def _pdf(self, x, a, b, c, d):
       return 42
mydist = Mydist(shapes='a, b, c, d')

[Strictly speaking, this only applies to scipy 0.13 and above. Earlier versions were using a different mechanism and required the shapes attribute.]

ev-br
  • 24,968
  • 9
  • 65
  • 78
  • 3
    So for example, if I fit some data using KDE (kernel density estimation) and I want to compute its expected value or entropy or something, do I just create a class of rv_continuous, feed my_kde_pdf as the _pdf and then just calling the method expect would yield the corresponding expectation? – Charlie Parker Mar 17 '14 at 15:13
  • 3
    In principle, yes. Assuming that you actually want it, rather than just DIYing the integral for the entropy or whatever expectation value you're after. – ev-br Mar 17 '14 at 16:51
  • 1
    DIYing? what do you mean by that? – Charlie Parker Apr 03 '14 at 16:23
  • 1
    Just coding one integral manually might or might not be easier than subclassing rv_continuous etc, that's all I meant. – ev-br Apr 04 '14 at 06:30
  • 1
    yea thats what I did in the end. rv_continuous is too annoying. – Charlie Parker Apr 04 '14 at 06:42