2D gaussian distribution does not sum to one?

Question

I built a wrapped bivariate gaussian distribution in Python using the equation given here: http://www.aos.wisc.edu/~dvimont/aos575/Handouts/bivariate_notes.pdf However, I don't understand why my distribution fails to sum to 1 despite having incorporated a normalization constant.

For a U x U lattice,

import numpy as np
from math import *

U = 60
m = np.arange(U)
i = m.reshape(U,1)
j = m.reshape(1,U)

sigma = 0.1
ii = np.minimum(i, U-i)
jj = np.minimum(j, U-j)
norm_constant = 1/(2*pi*sigma**2)
xmu = (ii-0)/sigma; ymu = (jj-0)/sigma
rhs = np.exp(-.5 * (xmu**2 + ymu**2))
ker = norm_constant * rhs

>> ker.sum() # area of each grid is 1 
15.915494309189533

I'm certain there's fundamentally missing in the way I'm thinking about this and suspect that some sort of additional normalization is needed, although I can't reason my way around it.

UPDATE:

Thanks to others' insightful suggestions, I rewrote my code to apply L1 normalization to kernel. However, it appears that, in the context of 2D convolution via FFt, keeping the range as [0, U] is able to still return a convincing result:

U = 100
Ukern = np.copy(U)
#Ukern = 15

m = np.arange(U)
i = m.reshape(U,1)
j = m.reshape(1,U)

sigma = 2.
ii = np.minimum(i, Ukern-i)
jj = np.minimum(j, Ukern-j)
xmu = (ii-0)/sigma; ymu = (jj-0)/sigma
ker = np.exp(-.5 * (xmu**2 + ymu**2))
ker /= np.abs(ker).sum()

''' Point Density '''
ido = np.random.randint(U, size=(10,2)).astype(np.int)
og = np.zeros((U,U))
np.add.at(og, (ido[:,0], ido[:,1]), 1)

''' Convolution via FFT and inverse-FFT '''
v1 = np.fft.fft2(ker)
v2 = np.fft.fft2(og)
v0 = np.fft.ifft2(v2*v1)
dd = np.abs(v0)

plt.plot(ido[:,1], ido[:,0], 'ko', alpha=.3)
plt.imshow(dd, origin='origin')
plt.show()

On the other hand, sizing the kernel using the commented-out line gives this incorrect plot:

I don't fully understand why you need the `np.minimum(i, U-i)`. What are you trying to achieve there? — Praveen, Jan 11 '16 at 10:24
@Praveen imaluengo was correct when suspecting that I'm trying to build a Gaussian kernel -- it represents individual movement range and I convolve it with discrete population distribution to get an estimate of the population density surface. The minimum function sets the peak of the kernel at the origin, and the kernel value decreases with distance to the origin. Therefore, "wrapping" implies that the kernel "wraps" around the UxU lattice edges, resulting in a plot with semicircles in four corners. — neither-nor, Jan 11 '16 at 19:44

Imanol Luengo · Answer 1 · 2016-01-11T11:11:54.640

5

NOTE: As stated in the comments bellow, this solution is only valid if you are trying to build a gaussian convolution kernel (or gaussian filter) for image processing purposes. It is not a properly normalized gaussian density function, but it is the form that is used to remove gaussian noise from images.

You are missing the L1 normalization:

ker /= np.abs(ker).sum()

Which will make your kernel behave like an actual density function. Since the grid you have can vary a lot in the magnitude of its values, the above normalization step is needed.

In fact, the gaussian nornalization constant you have could be ommited to just use the L1 norm above. If I'm not worng, you are trying to create a gaussian convolution, and th above is the usual normalization tecnique applied to it.

Your second mistake, as @Praveen has stated, is that you need to sample the grid from [-U//2, U//2]. You can do that as:

i, j = np.mgrid[-U//2:U//2+1, -U//2:U//2+1]

Last, if what you are trying to do is to build a gaussian filter, the size of the kernel is usually estimated from sigma (to avoid zeroes far from the centre) as U//2 <= t * sigma, where t is a truncation parameter usually set t=3 or t=4.

edited Jan 11 '16 at 11:11

answered Jan 11 '16 at 08:47

Imanol Luengo

15,366
2
49
67

I don't think this is a good idea. The OP seems to be trying to create a probability distribution (as per the notes being followed). Blindly normalizing the kernel will yield a valid probability density, but will destroy much of its statistical properties. – Praveen Jan 11 '16 at 10:27
@Praveen And yet a L1 normalized gaussian kernel is what is used in image processing to remove gaussian noise from an image. I do agree that it doesn't preserve properly statistical properties, but if I'm not wrong and what the OP wants is a gaussian filter (and not a gaussian model), it is the way to go. – Imanol Luengo Jan 11 '16 at 10:38
@Praveen, thanks for the comments. You helped me make it more clear (for the particular case). You also have mine, as your answer is actually the one modelling a real gaussian (and we don't know yet what is the OP after :p). It is quite helpful this kind of discussion as one always learns something knew. Thanks! – Imanol Luengo Jan 11 '16 at 11:36
@imaluengo I am indeed creating a gaussian kernel for convolution. You read my intention well! :) Thank you for your continual inputs and edits! I am however still a bit confused about the latter two points you made: why is it necessary to sample the grid from [-U//2, U//2], and why is it wrong to do it my way, from [0, U]? – neither-nor Jan 11 '16 at 19:57
@neither-nor It is because, if you set the range as `[0,U]`, you are essentialy centering the gaussian filter in the top left corner pixel of the window. Instead, when you apply a gaussian convolution, the kernel is centered on every pixel of the image (becoming that pixel the central pixel of the kernel, the `(0,0)`). This is to center the bivariate gaussian filter at the center of the window (the pixel of interest), and obtain the gaussian weighted mean value (the further the pixel is from the center, the smaller its weight). – Imanol Luengo Jan 11 '16 at 20:06
@neither-nor the 2 images @Praveen has posted show your actual: gaussian centered at bottom left corner (or top left if you invert `y` axis as is common in image processing); and the desired distribution (centered in the middle). – Imanol Luengo Jan 11 '16 at 20:08
@imaluengo I just implemented 2d gaussian convolution using both range settings in my update. Although what you said makes more intuitive sense, somehow numerically the [0,U] works better, not sure why.. – neither-nor Jan 11 '16 at 21:20
@neither-nor another thing: by doing `v0 = np.fft.ifft2(v1*v2)`, you are convolving your kernel with an image, not an image with a kernel. That line should be `v0 = np.fft.ifft2(v2*v1)`. And for the test, make `v2` bigger (100x100) and `v1` (the kernel) smaller (15x15). They don't need to have the same shape. Last, consider ussing `rfft2` and `irfft2` (use only the real part). – Imanol Luengo Jan 11 '16 at 21:49
@imaluengo Just implemented suggestions in update. Perhaps this doesn't work well in Python? – neither-nor Jan 11 '16 at 22:16
1

@neither-not No, it works in python. The result is completely fine. You are just convolving in the fourier domain, not in the image domain (by a missclick I deleted my old comment). I'm not at a computer right now (tablet), I'll update the post tomorrow with 2 examples of both fourier and sliding window convolutions. – Imanol Luengo Jan 11 '16 at 22:20

Praveen · Answer 2 · 2016-01-11T13:37:34.553

Currently, the (much zoomed-in) contour plot of your ker looks like this:

As you can see, this looks nothing like a Gaussian kernel. Most of your function dies off from 0 to 1. Looking at the kernel itself reveals that all values do indeed die off really really fast:

>>> ker[0:5, 0:5]
array([[  1.592e+001,   3.070e-021,   2.203e-086,   5.879e-195,   0.000e+000],
       [  3.070e-021,   5.921e-043,   4.248e-108,   1.134e-216,   0.000e+000],
       [  2.203e-086,   4.248e-108,   3.048e-173,   8.136e-282,   0.000e+000],
       [  5.879e-195,   1.134e-216,   8.136e-282,   0.000e+000,   0.000e+000],
       [  0.000e+000,   0.000e+000,   0.000e+000,   0.000e+000,   0.000e+000]])

The sum value of 15.915 that you get is basically just ker[0, 0]. What all this tells you is that you aren't constructing your grid correctly.

Remember that when creating the kernel on a computer, you'll have to sample it at appropriate points. Sampling it too coarsely will result in your sum not coming out right.

So firstly, if you want a full density centered around mu=0, you'll have to take i and j from -U // 2 to U // 2. But to solve your resolution problem, I recommend taking U number of points between -0.5 and 0.5.

import numpy as np
import matplotlib.pyplot as plt

U = 60
m = np.linspace(-0.5, 0.5, U)    # 60 points between -1 and 1
delta = m[1] - m[0]              # delta^2 is the area of each grid cell
(x, y) = np.meshgrid(m, m)       # Create the mesh

sigma = 0.1
norm_constant = 1 / (2 * np.pi * sigma**2)

rhs = np.exp(-.5 * (x**2 + y**2) / sigma**2)
ker = norm_constant * rhs
print(ker.sum() * delta**2)

plt.contour(x, y, ker)
plt.axis('equal')
plt.show()

This yields a sum close to 1.0, and a kernel centered at mu=0 as expected.

Knowing what range to choose (-0.5 to 0.5) in this case, depends on your function. For instance, if you now take sigma = 2, you'll find that your sum won't work out again, because now you're sampling too finely. Setting your range to be a function of your parameters - something like -5 * sigma to 5 * sigma - might be the best option.

If you feel like it, you could add the line `plt.axis('equal')` so the images have an aspect ratio of 1:1 — heltonbiker, Jan 11 '16 at 11:15
@heltonbiker True. I just didn't consider it. Let me improve this a bit then. — Praveen, Jan 11 '16 at 11:16
@Praveen Even though I'm not creating a statistically valid probability density here, your inputs are extremely helpful and will surely be useful to me soon enough! — neither-nor, Jan 11 '16 at 19:59

2D gaussian distribution does not sum to one?

2 Answers2