1

I test my image set on DBSCAN algorithm in scikit-learn python module . There are alternatives for similarity computing:

# Compute similarities
D = distance.squareform(distance.pdist(X))
S = 1 - (D / np.max(D))

A weighted measure or something like that i could try, examples?

Rohan Nadagouda
  • 462
  • 7
  • 18
postgres
  • 2,242
  • 5
  • 34
  • 50

3 Answers3

3

There exists a generalization of DBSCAN, known as "Generalized DBSCAN".

Actually for DBSCAN you do not even need a distance. Which is why it actually does not make sense to compute a similarity matrix in the first place.

All you need is a predicate "getNeighbors", that computes objects you consider as neighbors.

See: in DBSCAN, the distance is not really used, except to test whether an object is a neighbor or not. So all you need is this boolean decision.

You can try the following approach: initialize the matrix with all 1s. For any two objecs that you consider similar for your application (we can't help you a lot on that, without knowing your application and data), fill the corresponding cells with 0. Then run DBSCAN with epsilon = 0.5, and obviously DBSCAN will consider all the 0s as neighbors.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
0

You can use whatever similarity matrix you like. It just need to based on a valid distance (symmetric, positive semi-definite).

ogrisel
  • 39,309
  • 12
  • 116
  • 125
  • I don't know other similarity matrices, any example? or list where can i choose? – postgres Feb 13 '13 at 14:55
  • Cosine similarity between sparse positive vectors (e.g. word frequencies), heat kernel or RBF kernel, similarities based on the l1 (Manhattan) norm instead of the euclidean norm... – ogrisel Feb 13 '13 at 22:20
  • 1
    Actually no, it doesn't require to be a valid distance/metric. DBSCAN requires just a binary "isNeighbor" information. There is *no* symmetry requirement, technically. You could use a random matrix and DBSCAN would still work. (The results with proper distances are usually better though). @postgres you need to figure out what is "similar" for your particular task! – Has QUIT--Anony-Mousse Feb 14 '13 at 07:13
  • Actually, the `DBSCAN` estimator wants distances, not similarities. – Fred Foo Feb 14 '13 at 11:33
0

I believe DBSCAN estimator wants distances and not similarities. But again when it comes to strings it will require a similarity matrix, which can even be a line of code for matching the equality between two strings. Therefore it depends on you how you use the similarity matrix and distinguish between a neighbor and non neighbor objects.