How do I know how well my clustering of geospatial data has worked?

Question

I have a number of coordinates points, each associated with a particular landmark however they have varying and unknown degrees of accuracy. For each of these landmarks I have the coordinates of when a visitor says they are 'at the landmark'.

I would like to use the 'at landmark' coordinates to improve the accuracy of the landmarks for future visitors. However, as I change the parameters of the clustering algorithm, I really have no way of knowing whether I'm improving the likelihood of having actually improved upon existing locations or not, on average.

I would like to create an objective function which I could use as a proxy for this - any thoughts?

Note that google maps API calls will likely be unreliable due to imperfect addresses of the landmarks.

score 0 · Answer 1 · answered Nov 29 '18 at 18:20

0

One example is the posterior of a Gaussian Mixture Model. You can find some examples here: https://ch.mathworks.com/help/stats/clustering-using-gaussian-mixture-models.html

There are of course other clustering algorithms. Which one are you using?

answered Nov 29 '18 at 18:20

jubueche

763
5
24

Thanks @jbuchel! At the moment I’m using DBSCAN although I’m not particularly wedded to it (its appeal is in part because I don’t need to specify the number of clusters). If I understand correctly, your proposal offers an interesting looking alternative to how to cluster, but what I would really like is how to assess the performance of my (or another) clustering algorithm. I’m thinking something that would give me some measure of the confidence of my suggested new location - perhaps from the number of other clusters and / or spread of points within the cluster. – jedge Nov 29 '18 at 18:55

score 0 · Answer 2 · answered Nov 30 '18 at 11:48

0

If you want to reduce all these user tags to a single coordinate, I would suggest (except at the dateline) to simply use the median.

The reason is that the median has a very high breakdown point, i.e., it is robust to outliers.

answered Nov 30 '18 at 11:48

Has QUIT--Anony-Mousse

76,138
12
138
194

thank you! I did start with this actually however visual inspection revealed that it was making some strange recommendations. For example the visitor behaviour is such that sometimes they make, say, two clear clusters of recommendations. The median approach can then recommend a point between the two clusters (because it takes the median lat and median long independently). Similar problems if all the coordinates are quite spread from one another (in which case I prefer to not make a recommendation of a move). – jedge Nov 30 '18 at 12:05

How do I know how well my clustering of geospatial data has worked?

2 Answers2