2

I can't seem to get this algorithm to work on my dataset, so I took a very small subset of my data and tried to get it to work, but that didn't work either.

I want to input a precomputed distance matrix into ELKI, and then have it find the reachability distance list of my points, but I get reachability distances of 0 for all my points.

ID=1 reachdist=Infinity predecessor=1
ID=2 reachdist=0.0 predecessor=1
ID=4 reachdist=0.0 predecessor=1
ID=3 reachdist=0.0 predecessor=1

My ELKI arguments were as follows:

Running: -dbc DBIDRangeDatabaseConnection -idgen.start 1 -idgen.count 4 -algorithm clustering.optics.OPTICSList -algorithm.distancefunction external.FileBasedDoubleDistanceFunction -distance.matrix /Users/jperrie/Documents/testfile.txt -optics.epsilon 1.0 -optics.minpts 2 -resulthandler ResultWriter -out /Applications/elki-0.7.0/elkioutputtest

I use the DBIDRangeDatabaseConnection instead of an input file to create indices 1 through 4 and pass in a distance matrix with the following format, where there are 2 indices and a distance on each line.

1 2 0.0895585119724274
1 3 0.19458931684494
2 3 0.196315720677376
1 4 0.137940123677254
2 4 0.135852232575417
3 4 0.141511023044586

Any pointers to where I'm going wrong would be appreciated.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Froblinkin
  • 374
  • 3
  • 14
  • Have you tried *starting to count at 0*? It looks as if it assumes the distance of the first point to any other point is 0? – Has QUIT--Anony-Mousse Jan 05 '16 at 18:49
  • P.S. `OPTICSHeap` is usually faster than `OPTICSList` because a heap can find the best candidate in O(log n) instead of O(n). It may or may not make a measureable performance difference. The heap version is the older, and better tested, code; and thus the default. – Erich Schubert Jan 06 '16 at 14:21

2 Answers2

2

When I change your distance matrix to start counting at 0, then it appears to work:

ID=0 reachdist=Infinity predecessor=-2147483648
ID=1 reachdist=0.0895585119724274 predecessor=-2147483648
ID=3 reachdist=0.135852232575417 predecessor=1
ID=2 reachdist=0.141511023044586 predecessor=3

Maybe you should file a bug report - to me, this appears to be a bug. Also, predecessor=-2147483648 should probably be predecessor=None or something like that.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
2

This is due to a recent change, that may not yet be correctly presented in the documentation.

When you do multiple invocations in the MiniGUI, ELKI will assign fresh object DBIDs. So if you have a data set with 100 objects, the first run would use 0-99, the second 100-199 the third 200-299 etc. - this can be desired (if you think of longer running processes, you want object IDs to be unique), but it can also be surprising behavior.

However, this makes precomputed distance matrixes really hard to use; in particular with real data. Therefore, these classes were changed to use offsets. So the format of the distance matrix now is

DBIDoffset1 DBIDoffset2 distance

where offset 0 = start + 0 is the first object.

When I'm back in the office (and do not forget), I will 1. update the documentation to reflect this, provide 2. an offset parameter so that you can continue counting starting at 1, 3. make the default distance "NaN" or "infinity", and 4. add a sanity check that warns if you have 100 objects, but distances are given for objects 1-100 instead of 0-99.

Erich Schubert
  • 8,575
  • 2
  • 26
  • 42