1

This is a follow up from a previous question, where we commented that using euclidian distances with lat,long coordinates does not yeld correct results. I read in the documentation that ELKI enables geographic data, namely int its distance function, present in the various clustering algorithms. In the user interface of ELKI, I can see there are options to replace the default distance function (euclidian) by a better suited one. I also see that in that case, you need to provide a datum, which makes sense, since you have to tell ELKI how the data is projected. My options in the UI are to use "geo.LngLatDistanceFunction", since I am using (x,y) coordinates and to use "WGS84SpheroidEarthModel", since the data is in epsg:4326. I am trying to parametrize accordingly my algorithm in Java, but I am not sure how to do it: If I initialize my parameters like this:

ListParameterization params2 = new ListParameterization();
    params2.addParameter(de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN.Parameterizer.MINPTS_ID, minPoints);
params2.addParameter(de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN.Parameterizer.EPSILON_ID, epsilon);

Could I set the distance function like this?

params2.addParameter(de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm.DISTANCE_FUNCTION_ID, 
            de.lmu.ifi.dbs.elki.distance.distancefunction.geo.LngLatDistanceFunction.class);

What about the geo.model? (I have no clue about this)

Community
  • 1
  • 1
doublebyte
  • 1,225
  • 3
  • 13
  • 22

1 Answers1

2

The default earth model is SphericalVincentyEarthModel, which is supposedly a bit faster (but assumes a spherical earth, instead of a spheroid); but this should not make much of a difference unless you need precision to the meter: the maximum error should be 0.3% of the distance, according to this answer.

To set the earth model parameter, use EarthModel.MODEL_ID as option ID. (As referenced by the Parameterizer of LngLatDistanceFunction). When trying to find the appropriate option ID, always have a look at the Parameterizers - we are slowly moving all the option IDs into the Parameterizers.

Community
  • 1
  • 1
Erich Schubert
  • 8,575
  • 2
  • 26
  • 42
  • From what you tell me, it is prob ok to leave the earth as a sphere. I am more concerned about the results provided by changing the distance function to geo. If the snipped I showed above is correct, regarding setting the LngLatDistanceFunction, the results are a bit surprising. When I run DBSCAN, [the clusters returned are sets of a repeated point](https://ladybug.no-ip.org/files/clusters_lonlat.png). [These](https://ladybug.no-ip.org/files/clusters_manhattan.png) are the clusters returned by choosing a non-geo algorithm (ManhattanDistanceFunction). Any ideas why this may be happenning? – doublebyte May 15 '14 at 09:02
  • I add that my input data is a relation of number vectors (lon,lat), that according to [ELKI's documentation](http://elki.dbs.ifi.lmu.de/wiki/HowTo/GeoMining) is suppose to be geo _Relation> vectors = db.getRelation(TypeUtil.NUMBER_VECTOR_FIELD);_ – doublebyte May 15 '14 at 09:06
  • I decided to transform my comments on another [question](http://stackoverflow.com/questions/23684070/using-a-geo-distance-function-on-elki) – doublebyte May 15 '14 at 16:38