scipy.stats : bandwidth factor in gaussian kernel density estimator

Question

I generated a 2D gaussian distribution (uncorrelated datas)

dist2=np.array([np.random.normal(loc=10,scale=3, size=50000),np.random.normal(loc=5,scale=2, size=50000)])

I calculated the covariance matrix divided by bandwidth factor because the covariance attribute is The covariance matrix of dataset, scaled by the calculated bandwidth (kde.factor) (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html)

from scipy.stats import kde
# Use a kernel density estimator to produce local-counts in this space, and grid them to plot.
k = kde.gaussian_kde(dist2)
k.covariance/k.factor

Diagonal elements are not the square of the sigmas as expected.

I think there is something I have not understood on this bandwidth factor.

Any explanation would be appreciated. Thanks for help.

The covariance factor is implemented here so that `k.covariance/k.factor**2` ~ `np.cov(dist2)`. See here https://stackoverflow.com/questions/23630515/getting-bandwidth-used-by-scipys-gaussian-kde-function — Max Pierini, Apr 23 '21 at 07:10

score 1 · Accepted Answer · answered Apr 23 '21 at 12:31

1

In scipy.stats.kde.gaussian_kde the covariance factor is implemented so that k.covariance / k.factor**2 is ~ to np.cov(dist2).

Se here for details Getting bandwidth used by SciPy's gaussian_kde function

answered Apr 23 '21 at 12:31

Max Pierini

2,027
11
17

scipy.stats : bandwidth factor in gaussian kernel density estimator

1 Answers1