3

I want to group a list of Long and Lats (my_long_lats) based on pre determined center points (my_center_Points).

When I run:-

k <- kmeans(as.matrix(my_long_lats), centers = as.matrix(my_center_Points))

k$centers does not equal my_center_Points.

I assume k-means has adjusted my center points to the optimal center. But what I need is for my_center_Points to not change and group my_long_lats around them.

In this link they talk about setting initial centers but How do I set centers that wont change once I run the k means? Or is there a better clustering algorithm for this?

I could even settle for minimizing the movement of the centers.

I still have a lot to learn in R, any help is really appreciated.

Dave2e
  • 22,192
  • 18
  • 42
  • 50
Coopa
  • 213
  • 1
  • 12
  • 2
    Maybe you need a distance metric instead, for example the Euclidean distance between points? – Samuel Dec 27 '17 at 20:27

2 Answers2

4

centers are automatically evaluated after performing kmeans clustering. In fact, determining centers is a vital point in order to divide into cluster groups. Couple of options I can think that can help you.

  1. Limit iter.max. You can set it to just 1 in kmeans function call. This will not guarantee to keep centers fixed but changes will be less if you are dealing with large data sets.

  2. Use of dummy data. You can add many dummy data in your actual data sets around chosen centers. This will put extra weight towards along pre-determined centers. Most likely centers will remain unchanged.

MKR
  • 19,739
  • 4
  • 23
  • 33
3

Here is the calculation using the geosphere library to properly compute the distance from latitude and longitude.

The variable closestcenter is the result which identifies the closest center to each point.

#define random data
centers<-data.frame(x=c(44,44, 50, 50), y=c(44, 50, 44, 50))
pts<-data.frame(x=runif(50, 40, 55), y=runif(50, 40, 55))

#allocate space
distance<-matrix(-1, nrow = length(pts$x), ncol= length(centers$x))

library(geosphere)
#calculate the dist matrix - the define centers to each point
#columns represent centers and the rows are the data points
dm<-apply(data.frame(1:length(centers$x)), 1, function(x){ replace(distance[,x], 1:length(pts$x), distGeo(centers[x,], pts))})

#find the column with the smallest distance
closestcenter<-apply(dm, 1, which.min)

#color code the original data for verification
colors<-c("black", "red", "blue", "green")
plot(pts , col=colors[closestcenter], pch=19) 

enter image description here

Dave2e
  • 22,192
  • 18
  • 42
  • 50