Mine are follow-ups to the question & answer in Approaches for spatial geodesic latitude longitude clustering in R with geodesic or great circle distances.
I would like to better understand:
Question #1: If all the lat / long values are within the same city, is it necessary to use either fossil
or distHaversine(...)
to first calculate great circle distances ?
- or, within a single city, is it OK to run clustering on the lat/long values themselves ?
Question #2: jlhoward suggests that :
It's worth noting that these methods require that all points must go into some cluster. If you just ask which points are close together, and allow that some cities don't go into any cluster, you get very different results.
In my case I would like to ask just ask "which points are close together", without forcing every point into a cluster. How can I do this ?
Question #3: To include one or two factor variables into the clustering (in addition to lat/long), is it as easy as including those factor variables in the df upon which the clustering is run ?
Please confirm. Thanks!