I have dataset like the following fromat and im trying to find out the Kernel density estimation with optimal bandwidth.
data = np.array([[1, 4, 3], [2, .6, 1.2], [2, 1, 1.2],
[2, 0.5, 1.4], [5, .5, 0], [0, 0, 0],
[1, 4, 3], [5, .5, 0], [2, .5, 1.2]])
but I couldn't figure out how to approach it. also how to find the Σ matrix.
UPDATE
I tried KDE function from scikit-learn toolkits to find out univariate(1D) kde,
# kde function
def kde_sklearn(x, x_grid, bandwidth):
kde = KernelDensity(kernel='gaussian', bandwidth=bandwidth).fit(x)
log_pdf = kde.score_samples(x_grid[:, np.newaxis])
return np.exp(log_pdf)
# optimal bandwidth selection
from sklearn.grid_search import GridSearchCV
grid = GridSearchCV(KernelDensity(), {'bandwidth': np.linspace(.1, 1.0, 30)}, cv=20)
grid.fit(x)
bw = grid.best_params_
# pdf using kde
pdf = kde_sklearn(x, x_grid, bw)
ax.plot(x_grid, pdf, label='bw={}'.format(bw))
ax.legend(loc='best')
plt.show()
Can any one help me to extend this to multivariate / in this case 3D data?