I have arrays of latitude and longitude data points which I want to do hierachical clustering. Here is my code:
position = zip(longitude, latitude)
X = np.asarray(position)
knn_graph = kneighbors_graph(X, 30, include_self=False, metric= haversine)
for connectivity in (None, knn_graph):
for n_clusters in(5,8,10,15,20):
plt.figure(figsize=(4, 5))
cnt = 0
for index, linkage in enumerate(('average', 'complete', 'ward')):
model = AgglomerativeClustering(linkage = linkage,
connectivity = connectivity,
n_clusters = n_clusters)
model.fit(X)
plt.scatter(X[:, 0], X[:, 1], c=model.labels_,
cmap=plt.cm.spectral)
plt.title('linkage=%s (ncluster) %s)' % (linkage, n_clusters),
fontdict=dict(verticalalignment='top'))
plt.axis([37.1, 37.9, -122.6, -121.6])
plt.show()
the problem is for kneighbors_graph
there is a parameter called metric
which is how we defined the destination,http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.kneighbors_graph.html I want to define my own(real distance regard to the logitude and latitude and earth radius). Let seems I could not plug in my own function, any ideas?