1

I have a set of data distributed on a sphere and I am trying to understand what metrics must be given to the function DBSCAN distributed by scikit-learn. It cannot be the Euclidean metrics, because the metric the points are distributed with is not Euclidean. Is there, in the sklearn packet, a metric implemented for such cases or is dividing the data in small subsets the easiest (if long and tedious) way to proceed?

P.S. I am a noob at python

P.P.S. In case I "precompute" the metric, in what form do I have to submit my precomputed data? Like this?

0 - event1 - event2 - ...

event1 - 0 - distance(event1,event2) - ...

event2 - distance(event1,event2) - 0

Please, help?

  • I do not understand your P.P.S at all... which metric do you want to use? – Has QUIT--Anony-Mousse Nov 16 '14 at 19:04
  • The data I have to process are distributed on the celestial sphere, the positions are given with right ascension and celestial declination. I have already a program which computes distances between points but I have no idea how to insert the already-computed distances as "precomputed metric" into dbscan, so I wondered wether there was already a metric which responded to my needs – maythemoonshine Nov 16 '14 at 20:16

1 Answers1

0

Have you tried metric="precomputed"?

Then pass the distance matrix to the DBSCAN.fit function instead of the data.

From the documentation:

X array [n_samples, n_samples] or [n_samples, n_features] :

Array of distances between samples, or a feature array. The array is treated as a feature array unless the metric is given as ‘precomputed’.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194