Suppose I had two 2D sets of 1000 samples that look something like this:
I'd like to have a metric for the amount of difference between the distributions and thought the KL divergence would be suitable.
I've been looking at sp.stats.entropy(), however from this answer:
Interpreting scipy.stats.entropy values it appears I need to convert it to a pdf first. How can one do this using a 4 1D arrays?
The example data above was generated as follows:
dist1_x = np.random.normal(0, 10, 1000)
dist1_y = np.random.normal(0, 5, 1000)
dist2_x = np.random.normal(3, 10, 1000)
dist2_y = np.random.normal(4, 5, 1000)
plt.scatter(dist1_x, dist1_y)
plt.scatter(dist2_x, dist2_y)
plt.show()
For my real data I only have the samples, not the distribution from which they came (although if need be one could calculate the mean and variance and assume Gaussian). Is it possible to calculate the KL divergence like this?