0

I want to do cluster the data which consists of object names, x_coordinate, y_coordinate and corresponding temperature. Trying mean square clustering algorithm for clustering the nearby object according to location and the nearby temperature i.e. identify hot and cold areas. Following is code and small sample data. but it gives only single cluster by default settings but cannot show graph. I would like to know what might be wrong in following code:

import numpy as np  
from mpl_toolkits.mplot3d import Axes3D  
import pandas as pd  
from sklearn.decomposition import PCA    
from sklearn.cluster import MeanShift, estimate_bandwidth  
import matplotlib.pyplot as plt  
from itertools import cycle  

data = pd.read_csv("data.csv")

centers = [[1, 1, 1], [0,0,0], [0,0,0]]  
X= data._get_numeric_data()  
bandwidth = estimate_bandwidth()  

ms = MeanShift()  
ms.fit(X)  
labels = ms.labels_  
cluster_centers = ms.cluster_centers_  

print labels  
print cluster_centers  

fig = plt.figure()  
ax = plt.axes(projection='3d')  
x = data['x_cordinate']  
y=data['y_cordinate']  
z=data['tpa']  
c=labels  
ax.scatter(x,y,z, c=c)  
plt.show()  

Data.csv :

name,x_cordinate,y_cordinate,temperature
Ctrs3,5189200,6859000,0.3998434286
Ctrs4,5173360,6812800,0.4779542857
Ctrs5,5660440,6812800,0.7044195918
Cstrs3,1935400,5929720,0
Cstrs4,1953880,5929720,0
Cstrs5,491320,2689120,0
Cltrs3,3436240,5884840,0.3998434286
Cltrs4,3296320,5884840,0.4779542857
Cltrs5,5426800,5725120,0.7044195918

Harshad
  • 33
  • 6

1 Answers1

0

estimate_bandwidth needs an argument (your data). Does this code run?

Anyway... when this happens to me, I give smaller values of the quantile parameter for estimate_bandwidth than the default 0.3 (and pass that bandwidth estimate to the MeanShift constructor!).

You may also know a good bandwidth a-priori and are best using that if you do.

welch
  • 936
  • 8
  • 12
  • yes, I also tried with default values and varying the value as 0.5 , it ran for some 20 hours approximately and gave only one cluster – Harshad Jan 02 '17 at 12:41
  • how about posting the code and data your are actually using? the code above does not run because of a syntax error, does not run with the syntax error fixed because estimate_bandwidth works in batches of 500 items (you've provided 9), and does not run after providing a plausible bandwidth directly to the MeanShift constructor (1.5e6) because of a misnamed column in the display portion of the code. I've run out of enthusiasm or I would point out your data is badly scaled (x and y vs temp scales) to be used directly in a distance method like meanshift or kmeans – welch Jan 05 '17 at 18:07
  • regarding data file is too big too share, if you provide your email or cloud repository id or by any other means I can share it. regarding kmeans data can be clustered with n number of cluster but I am also trying to find the exact difference of clusters with help of mean shift, kmean and density based algorithm – Harshad Jan 10 '17 at 13:37