I am new to R and (unsupervised) machine learning. I'm trying to find out the best cluster solution for my data in R.
What is my data about?
I have a dataset with +/- 800 long / lat WGS84 coordinates in one city.
Long is in the range 6.90 - 6.95 lat is in the range 52.29 - 52.33
What do I want?
I want to find "hotspots" based on their density. As example: minimum 5 long/lat points in a range of 50 meter. This is a point plot example:
Why do I want this?
As example: let's assume that every single point is a car accident. By clustering the points I hope to see which areas need attention. (min x points in a range of x meter needs attention)
What have I found?
The following clustering algorithms seems possible for my solution:
- DBscan (https://cran.r-project.org/web/packages/dbscan/dbscan.pdf)
- HDBscan(https://cran.r-project.org/web/packages/dbscan/vignettes/hdbscan.html)
- OPTICS (https://www.rdocumentation.org/packages/dbscan/versions/0.9-8/topics/optics)
- City Clustering Algorithm (https://cran.r-project.org/web/packages/osc/vignettes/paper.pdf)
My questions
- What is the best solution or algorithm for my case in R?
- Is it true that I have to convert my long/lat to a distance / Haversine matrix first?