-1

I have pickup and drop locations in the form of latitude and longitude. I'm clustering the locations based on their pickup locations using hierarchical clustering.

Zd = linkage(squareform(pickDistance), method= "ward", metric = "haversine")
cld = fcluster(Zd, 30, criterion = 'distance')

Here, 'pickDistance' is the proximity matrix created taking all the pickup lat-lons. Using distance matrix for each cluster formed, and taking the pickup and drop locations, the or tool's routing solver is giving me the routes for multiple vehicles, for each cluster.

When I am increasing the cluster_distance, the solver keeps executing, and in the end I cancel the execution and reset the cluster_distance and max_distance till I get the routes.

I want to understand a few things here:

  • How to set an optimal cluster_distance and what is the best clustering method, according to you to cluster geo-locations?

  • How does max_distance parameter in the routing solver work? And, Is the max_distance for each vehicle or for all the vehicles it shall be utilizing?

  • Is there any way to make the cluster_distance and max_distance parameter of the routing solver dynamic, in a way that it will work for any number of locations in a cluster?

Kindly Help.

1 Answers1

-1

K-means should be right in this case. Since k-means tries to group based solely on euclidean distance between objects you will get back clusters of locations that are close to each other.

To find the optimal number of clusters you can try making an 'elbow' plot of the within group sum of square distance. This may be helpful! (http://nbviewer.ipython.org/github/nborwankar/LearnDataScience/blob/master/notebooks/D3.%20K-Means%20Clustering%20Analysis.ipynb)