Jensen-Shannon Divergence

Question

I have another question that I was hoping someone could help me with.

I'm using the Jensen-Shannon-Divergence to measure the similarity between two probability distributions. The similarity scores appear to be correct in the sense that they fall between 1 and 0 given that one uses the base 2 logarithm, with 0 meaning that the distributions are equal.

However, I'm not sure whether there is in fact an error somewhere and was wondering whether someone might be able to say 'yes it's correct' or 'no, you did something wrong'.

Here is the code:

from numpy import zeros, array
from math import sqrt, log


class JSD(object):
    def __init__(self):
        self.log2 = log(2)


    def KL_divergence(self, p, q):
        """ Compute KL divergence of two vectors, K(p || q)."""
        return sum(p[x] * log((p[x]) / (q[x])) for x in range(len(p)) if p[x] != 0.0 or p[x] != 0)

    def Jensen_Shannon_divergence(self, p, q):
        """ Returns the Jensen-Shannon divergence. """
        self.JSD = 0.0
        weight = 0.5
        average = zeros(len(p)) #Average
        for x in range(len(p)):
            average[x] = weight * p[x] + (1 - weight) * q[x]
            self.JSD = (weight * self.KL_divergence(array(p), average)) + ((1 - weight) * self.KL_divergence(array(q), average))
        return 1-(self.JSD/sqrt(2 * self.log2))

if __name__ == '__main__':
    J = JSD()
    p = [1.0/10, 9.0/10, 0]
    q = [0, 1.0/10, 9.0/10]
    print J.Jensen_Shannon_divergence(p, q)

The problem is that I feel that the scores are not high enough when comparing two text documents, for instance. However, this is purely a subjective feeling.

Any help is, as always, appreciated.

Maybe try comparing output to [this Matlab script?](https://www.mathworks.com/matlabcentral/fileexchange/20689-jensen-shannon-divergence) Or run it in Octave. — Josiah Hester, Apr 08 '13 at 13:22
if p[x] != 0.0 or p[x] != 0 is used to make sure that we don't consider entries which are zero, whether they are floats or integers, is that what you were referring to? Or did you mean that this line is weird full stop? Many thanks. — Martyn, Apr 08 '13 at 14:03
`p[x] != 0` is the same because `0.0 == 0`. That's why I suspected there might be a typo there. — Janne Karila, Apr 08 '13 at 17:41

Doug Shore · Answer 1 · 2014-12-12T14:34:56.050

26

Note that the scipy entropy call below is the Kullback-Leibler divergence.

See: http://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence

#!/usr/bin/env python
from scipy.stats import entropy
from numpy.linalg import norm
import numpy as np

def JSD(P, Q):
    _P = P / norm(P, ord=1)
    _Q = Q / norm(Q, ord=1)
    _M = 0.5 * (_P + _Q)
    return 0.5 * (entropy(_P, _M) + entropy(_Q, _M))

Also note that the test case in the Question looks erred?? The sum of the p distribution does not add to 1.0.

See: http://www.itl.nist.gov/div898/handbook/eda/section3/eda361.htm

edited Dec 12 '14 at 14:34

answered Dec 11 '14 at 21:29

Doug Shore

771
8
9

1

Importing and using `norm` is not needed, as `entropy` will normalize the distributions if they do not add up to 1 (see http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.stats.entropy.html). However, to compute `_M` like that, `_P` and `_Q` need to be `numpy.ndarray` objects. – Tur1ng Jul 09 '15 at 18:03
4

@Tur1ng note that norm is needed because the calculation of `_M` requires that `_P` and `_Q` are probability distributions (already normalized). Also note that lists are coerced as numpy arrays so this is fine: `[2, 4] / np.array([1, 2])` – Doug Shore Jul 14 '15 at 17:13
@DougShore actually, since `scipy.stats.entropy` normalizes the distributions, you don't need to normalize `_P` and `_Q` to compute `_M`, you only need them to sum up to the same value, and you can probably save a few computations. However, this is much more readable like this. On the other hand, I would prefer functions that don't do unnecessary computations, and assume that the input are normalized probabilities. – emem Nov 16 '17 at 09:03
So, in the @Doug Shore's code do I need to have the `P, Q` frequency lists (`list_a` and `list_b`) in my occasion: `list_a = [1, 100, 40, 1200, 0, 4]` and `list_b = [23, 5600, 11, 0, 40, 340]` as un-normalized as you see it above? Or should I normalize them before feed them in the `JSD(P, Q)` function? – just_learning Feb 17 '22 at 12:52
1

@just_learning the JSD function normalizes the inputs (as probability distributions), so yes JSD(list_a, list_b) will work – Doug Shore Feb 25 '22 at 21:28

score 19 · Answer 2 · answered Mar 03 '19 at 10:51

Since the Jensen-Shannon distance (distance.jensenshannon) has been included in Scipy 1.2, the Jensen-Shannon divergence can be obtained as the square of the Jensen-Shannon distance:

from scipy.spatial import distance

distance.jensenshannon([1.0/10, 9.0/10, 0], [0, 1.0/10, 9.0/10]) ** 2
# 0.5306056938642212

Ber · Answer 3 · 2013-10-22T07:03:18.750

9

Get some data for distributions with known divergence and compare your results against those known values.

BTW: the sum in KL_divergence may be rewritten using the zip built-in function like this:

sum(_p * log(_p / _q) for _p, _q in zip(p, q) if _p != 0)

This does away with lots of "noise" and is also much more "pythonic". The double comparison with 0.0 and 0 is not necessary.

edited Oct 22 '13 at 07:03

answered Apr 08 '13 at 14:50

Ber

40,356
16
72
88

alemol · Answer 4 · 2018-06-14T22:08:46.927

A general version, for n probability distributions, in python

import numpy as np
from scipy.stats import entropy as H


def JSD(prob_distributions, weights, logbase=2):
    # left term: entropy of misture
    wprobs = weights * prob_distributions
    mixture = wprobs.sum(axis=0)
    entropy_of_mixture = H(mixture, base=logbase)

    # right term: sum of entropies
    entropies = np.array([H(P_i, base=logbase) for P_i in prob_distributions])
    wentropies = weights * entropies
    sum_of_entropies = wentropies.sum()

    divergence = entropy_of_mixture - sum_of_entropies
    return(divergence)

# From the original example with three distributions:
P_1 = np.array([1/2, 1/2, 0])
P_2 = np.array([0, 1/10, 9/10])
P_3 = np.array([1/3, 1/3, 1/3])

prob_distributions = np.array([P_1, P_2, P_3])
n = len(prob_distributions)
weights = np.empty(n)
weights.fill(1/n)

print(JSD(prob_distributions, weights))
#0.546621319446

score 2 · Answer 5 · answered Nov 11 '16 at 09:48

Explicitly following the math in the Wikipedia article:

def jsdiv(P, Q):
    """Compute the Jensen-Shannon divergence between two probability distributions.

    Input
    -----
    P, Q : array-like
        Probability distributions of equal length that sum to 1
    """

    def _kldiv(A, B):
        return np.sum([v for v in A * np.log2(A/B) if not np.isnan(v)])

    P = np.array(P)
    Q = np.array(Q)

    M = 0.5 * (P + Q)

    return 0.5 * (_kldiv(P, M) +_kldiv(Q, M))

Jensen-Shannon Divergence

5 Answers5

Linked