Why does KernelDensity.score_samples compute the distance on each node?

Question

I'm using a KD-estimation with a custom metric. The metric is obviously slower than the builtin euclidean distance, but works fine. When doing

kde=KernelDensity(...)
kde.fit(X)

I get results in a reasonable amount of time.

When I then calculate

surface=np.exp(kde.score_samples(meshgrid))

where mehsgrid is a numpy array of the size (about) 64000x2, kde calculates the distance on each point in the grid. I seem to bascially misunderstand why that's necessary... The density is already calculated with the .fit() method, and score_samples "should" simply evaluate the density on each point in the grid - right? Do I overlook something?

Whe I do all the calculations with the builtin euclidean metric, the computation is fairly fast, no hint that .score_samples would iterate over gazillions of points...

Any hint is appreciated.

score 0 · Answer 1 · answered May 15 '15 at 15:36

0

You need to compute the density at the meshgrid points if you want to score the samples. Depending on how you pass the metric, this will be done using a brute-force approach, which means computing the distances to all the points.

You can use your metric with the built-in BallTree, which might save you some computation, but that depends on your dataset and the metric you use.

answered May 15 '15 at 15:36

Andreas Mueller

27,470
8
62
74

Whar do you mean exactly by "how you pass the metric"? I wrote a class that does some precomputations (and also calculates the distance), and then pass one method of that class in a dict to metric_params in KernelDensity. I also use the builtin ball-tree. The call is something like `KernelDensity(...metric="pyfunc",metric_params={"func":fancyClass.distanceMethod,"more_metric params":more_values},algorithm=ball_tree)` – May 16 '15 at 19:31
Well that is exactly what I meant. That is the best you can do I think. – Andreas Mueller May 18 '15 at 16:06

Why does KernelDensity.score_samples compute the distance on each node?

1 Answers1