Applying the functions scipy.ndimage.filters.gaussian_filter and scipy.stats.gaussian_kde over a given set of data can give very similar results if the sigma
and bw_method
parameters in each function respectively are chosen adequately.
For example, I can obtain for a random 2D distribution of points the following plots by setting sigma=2.
in the gaussian_filter
(left plot) and bw_method=sigma/30.
in the gaussian_kde
(right plot):
(The MWE is at the bottom of the question)
There's obviously a relation between these parameters since one applies a Gaussian filter and the other one a Gaussian Kernel Density Estimator on the data.
The definition of each parameter is:
sigma : scalar or sequence of scalars Standard deviation for Gaussian kernel. The standard deviations of the Gaussian filter are given for each axis as a sequence, or as a single number, in which case it is equal for all axes.
This one I can understand given the definition of the Gaussian operator:
- scipy.stats.gaussian_kde,
bw_method
:
bw_method : str, scalar or callable, optional The method used to calculate the estimator bandwidth. This can be ‘scott’, ‘silverman’, a scalar constant or a callable. If a scalar, this will be used directly as kde.factor. If a callable, it should take a gaussian_kde instance as only parameter and return a scalar. If None (default), ‘scott’ is used. See Notes for more details.
In this case let's assume the input for bw_method
is a scalar (float) so as to be comparable with sigma
. Here's where I get lost since I can find no information about this kde.factor
parameter anywhere.
What I'd like to know is the precise mathematical equation that connects both these parameters (ie: sigma
and bw_method
when a float is used) if possible.
MWE:
import numpy as np
from scipy.stats import gaussian_kde
from scipy.ndimage.filters import gaussian_filter
import matplotlib.pyplot as plt
def rand_data():
return np.random.uniform(low=1., high=200., size=(1000,))
# Generate 2D data.
x_data, y_data = rand_data(), rand_data()
xmin, xmax = min(x_data), max(x_data)
ymin, ymax = min(y_data), max(y_data)
# Define grid density.
gd = 100
# Define bandwidth
bw = 2.
# Using gaussian_filter
# Obtain 2D histogram.
rang = [[xmin, xmax], [ymin, ymax]]
binsxy = [gd, gd]
hist1, xedges, yedges = np.histogram2d(x_data, y_data, range=rang, bins=binsxy)
# Gaussian filtered histogram.
h_g = gaussian_filter(hist1, bw)
# Using gaussian_kde
values = np.vstack([x_data, y_data])
# Data 2D kernel density estimate.
kernel = gaussian_kde(values, bw_method=bw / 30.)
# Define x,y grid.
gd_c = complex(0, gd)
x, y = np.mgrid[xmin:xmax:gd_c, ymin:ymax:gd_c]
positions = np.vstack([x.ravel(), y.ravel()])
# Evaluate KDE.
z = kernel(positions)
# Re-shape for plotting
z = z.reshape(gd, gd)
# Make plots.
fig, (ax1, ax2) = plt.subplots(1, 2)
# Gaussian filtered 2D histograms.
ax1.imshow(h_g.transpose(), origin='lower')
ax2.imshow(z.transpose(), origin='lower')
plt.show()