0

Mine are follow-ups to the question & answer in Approaches for spatial geodesic latitude longitude clustering in R with geodesic or great circle distances.

I would like to better understand:

Question #1: If all the lat / long values are within the same city, is it necessary to use either fossil or distHaversine(...) to first calculate great circle distances ?

  • or, within a single city, is it OK to run clustering on the lat/long values themselves ?

Question #2: jlhoward suggests that :

It's worth noting that these methods require that all points must go into some cluster. If you just ask which points are close together, and allow that some cities don't go into any cluster, you get very different results.

In my case I would like to ask just ask "which points are close together", without forcing every point into a cluster. How can I do this ?

Question #3: To include one or two factor variables into the clustering (in addition to lat/long), is it as easy as including those factor variables in the df upon which the clustering is run ?

Please confirm. Thanks!

Community
  • 1
  • 1
  • 1
    One question per question please. I'll answer the first, but I think the other two depend on the exact function you use to do the clustering, so create another two questions with those, preferably as little examples with some data.... – Spacedman May 30 '14 at 07:15

1 Answers1

0

"within a single city, is it OK to run clustering on the lat/long values themselves ?"

Yes, as long as your city is on the equator, where a degree of longitude is the same distance as a degree of latitude.

I'm standing very close to the north pole. One degree of longitude is 1/360 of the circumference of the circle round the pole from me. Someone ten degrees east of me might only be ten feet away. Someone one degree south of me is miles away. A clustering algorithm based on lat-long would think that guy miles away was closer to me than the guy I can wave to ten degrees east of me.

The solution for small areas to save having to compute great-circle ellipsoid distances is to project to a coordinate system that is near-enough cartesian so that you can use pythagoras' theorem for distance without too much error. Typically you would use a UTM zone transform, which is essentially a coordinate system that puts its equator through your study area.

The spTransform function in sp and rgdal will sort this out for you.

Spacedman
  • 92,590
  • 12
  • 140
  • 224