3

I would like to use the Jensen-Shannon divergence as a histogram distance function. I'm implementing a simple image similarity search, and the histograms are normalized RGB color distributions.

I have a question on the Kullback-Leibler divergence formula (on which JS is based on): what should I return when Pi or Qi are zero?

Here is the implementation in F#:

let dKL p q =
    Array.map2 (fun pi qi -> if pi = 0. then ?   // ?
                             elif qi = 0. then ? // ?
                             else pi * log (pi / qi)) p q
    |> Array.sum

and the Jensen-Shannon distance that uses it:

let dJS p q =
    let m = Array.map2 (fun pi qi -> (pi + qi) / 2.) p q
    (dKL p m) / 2. + (dKL q m) / 2.

Wikipedia says that it should returns 0 when pi=0 and qi>0, and is not defined when qi=0, but for histogram distance it does not make much sense. What values would make sense in this case?

Edit

here's the correct version as per Whatang's answer, for future reference:

let dKL p q =
    Array.map2 (fun pi qi -> if pi = 0. && qi = 0. then 0.
                             else pi * log (pi / qi)) p q
    |> Array.sum
Francesco De Vittori
  • 9,100
  • 6
  • 33
  • 43
  • I'm curious, I've been taking some stats night classes (for reference: we're studying MLE/MVUE/Sufficiency/etc at this point), but I don't see how you can shoehorn this distribution distance into one about relative frequency. Keep in mind my limited knowledge before you call me silly. – Ritch Melton Apr 03 '12 at 22:39
  • There isn't really a good alternative from what I am reading `pi=0 -> 0` is just to avoid `0 * log 0` which is undefined, and `qi=0 -> undefined` is because otherwise you have division by zero. – Guvante Apr 03 '12 at 23:19
  • There's a question similar to yours with a good answer on the Stats StackExchange: http://stats.stackexchange.com/a/14135 – Jack P. Apr 04 '12 at 12:46
  • @Guvante the issue is what value is meaningful in those cases. When qi is 0 and pi is 0 there's no problem because 1) the values are equals thus distance is obviously 0, and 2) it's common to consider 0 log 0 as 0. On the other hand the problem is when only qi is 0, but as Whatang shown, this can never happen in this particular case. – Francesco De Vittori Apr 04 '12 at 14:13
  • @RitchMelton I'm not an expert, but the idea is that a relative frequency distribution is pretty much the same thing as a probability distribution, so Jensen-Shannon, Kullback-Leibler, chi-squared & co. are ok. The practical implementation I'm testing confirms this, JS works very well (slightly better than chi-squared). – Francesco De Vittori Apr 04 '12 at 14:19

1 Answers1

4

Since you're using this to build the Jensen-Shannon divergence the only way that you can have qi equal to zero in the calculation of the Kullback-Leibler divergence is if the pi value is also zero. This is because really you're calculating the average of dKL(p,m) and dKL(q,m), where m=(p+q)/2. So mi=0 implies both pi=0 and qi=0.

Expand the definition of dKL to be p log p - p log m, and use the convention/limit that 0 log 0 = 0 and you'll see that there's no problem: m can only be zero when p also is.

To make a long story short, when you call dKL from dJS the second clause elif qi = 0 will never be executed: put whatever you like in there (probably a good idea to make it zero unless you're going to call dKL from somewhere else).

Whatang
  • 9,938
  • 2
  • 22
  • 24