-4

I met a problem when I use sklearn.cluster.DBSCAN. If I use DBSCAN(metric="russellrao"), which data format should be? I try 2 ways and both return pred = [-1 -1 -1 ..., -1 -1 -1] . You can see the 2 data format below.

npy = df2.values
y_pred = DBSCAN(metric="russellrao").fit_predict(npy)

1. npy = enter image description here

2. npy = enter image description here

print y_pred [-1 -1 -1 ..., -1 -1 -1]

so,which format is the right anwser?

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Ao.L
  • 33
  • 1
  • 5
  • Welcome to SO, Please avoid screenshots, use copy-paste and format it accordingly. It is useless to get help, heavier in term of bytes and not handy at all. – jlandercy Dec 21 '18 at 16:05

1 Answers1

0

You need to choose the other DBSCAN parameters appropriately.

IMHO, sklearn should not have defaults for them. In particular epsilon depends very much on your data set and metric, so the default will almost always be a bad choice. Instead of providing bad defaults, it should force users to choose the parameters.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • yes,I know,but I just don't sure that metric has the parameter "russellrao".How can I find the all parameters?The document(http://sklearn.lzjqsdd.com/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN) is too berif. – Ao.L Mar 26 '18 at 02:14
  • If you follow the rabbit into the documentation, you can get a list of efficiently supported metrics. Or you add your own. – Has QUIT--Anony-Mousse Mar 26 '18 at 08:29