2

scipy.stats.entropy calculates the differential entropy for a continuous random variable. By which estimation method, and which formula, exactly is it calculating differential entropy? (i.e. the differential entropy of a norm distribution versus that of the beta distribution)

Below is its github code. Differential entropy is the negative integral sum of the p.d.f. multiplied by the log p.d.f., but nowhere do I see this or the log written. Could it be in the call to integrate.quad?

def _entropy(self, *args):
    def integ(x):
        val = self._pdf(x, *args)
        return entr(val)

    # upper limit is often inf, so suppress warnings when integrating
    _a, _b = self._get_support(*args)
    with np.errstate(over='ignore'):
        h = integrate.quad(integ, _a, _b)[0]

    if not np.isnan(h):
        return h
    else:
        # try with different limits if integration problems
        low, upp = self.ppf([1e-10, 1. - 1e-10], *args)
        if np.isinf(_b):
            upper = upp
        else:
            upper = _b
        if np.isinf(_a):
            lower = low
        else:
            lower = _a
        return integrate.quad(integ, lower, upper)[0]

Source (lines 2501 - 2524): https://github.com/scipy/scipy/blob/master/scipy/stats/_distn_infrastructure.py

develarist
  • 1,224
  • 1
  • 13
  • 34
  • I think this is the actual function using differential entropy. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.differential_entropy.html – user1769197 Dec 13 '22 at 03:30

2 Answers2

1

You have to store a continuous random variable in some parametrized way anyway, unless you work with an approximation. In that case, you usually work with distribution objects; and for known distributions, formulae for the differential entropy in terms of the parameters exist.

Scipy accordingly provides an entropy method for rv_continuous that calculates the differential entropy where possible:

In [5]: import scipy.stats as st                                                                                                                             

In [6]: rv = st.beta(0.5, 0.5)                                                                                                                               

In [7]: rv.entropy()                                                                                                                                         
Out[7]: array(-0.24156448)
phipsgabler
  • 20,535
  • 4
  • 40
  • 60
  • rv_continuous.entropy only returns True or False and not a value for differential entropy. what does this mean? – develarist Aug 04 '20 at 10:21
  • What? That's not what it's supposed to do, see [here](https://github.com/scipy/scipy/blob/v1.5.2/scipy/stats/_distn_infrastructure.py#L1135-L1169). Are you sure you're calling it correctly? – phipsgabler Aug 04 '20 at 15:13
  • even the example in your link shows a Boolean returned output: >>> drv = rv_discrete(values=((0, 1), (0.5, 0.5))) >>> np.allclose(drv.entropy(), np.log(2.0)) **True** – develarist Aug 04 '20 at 19:06
  • Yeah, sure. As a result of a call to `np.allclose`. Which shows that the result of `entropy` is approximately equal to `log(2)`, _an irrational number_. – phipsgabler Aug 04 '20 at 19:21
  • so you're saying rv_continuous.entropy is normally supposed to give back a real number for an output and not a Boolean? please show an example without np.allclose that works – develarist Aug 04 '20 at 19:31
  • It sure should, according to both the docs and the source code I linked to. I can't produce an example myself, as I'm on my phone; why don't you just try it? – phipsgabler Aug 04 '20 at 19:35
  • I did try before all this, and it gives an error i think because i don't know how to discretize a continuous r.v.. That's why i thought the np.allclose does the discretizing but now i'm told that even that is causing the current Boolean thing – develarist Aug 04 '20 at 19:37
  • I don't quite get what you try to do. There's no need to discretize if you can represent the continuous distribution in parametrized form. See my edit. – phipsgabler Aug 05 '20 at 17:13
  • 1
    ok, I've figured out the Boolean issue and am getting a numerical entropy value now. But how is `.entropy` actually calculating it for these families of distributions? what formula and estimation method for entropy is the function using? – develarist Dec 27 '20 at 19:21
0

The actual question here is how do you store a continuous variable in memory. You might use some discretization techniques and calculate entropy for a discrete random variable.

You also may check Tensorflow Probability, which treats distributions essentially as tensors and has a method entropy() for a Distribution class.

david.a.
  • 19
  • 3
  • do u mean discretize the continuous random variable array using scipy.stats.rv_discrete? How is this done for a T-sized vector/time series? The docs only show an example of a uniform binary variable – develarist Aug 04 '20 at 10:31