I'm working on a scatter-plot of clusters.
With sklearn.cluster.KMeans I get a list the same length as my 2D-matrix X...
Running
k = 5
df = pd.read_csv('data_latlong.csv')
lat = df['Lat'].values
long = df['Long'].values
X = np.matrix(list(zip(lat, long)))
kmeans = KMeans(n_clusters=k).fit(X)
plt.figure(figsize=(10, 10))
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_)
plt.title("n = 5")
plt.show()
Gives me the following ValueError:
ValueError: 'c' argument has 3909 elements, which is not acceptable for use with 'x' with size 3909, 'y' with size 3909.
Any suggestions how to handle that?
Thanks!
Solution - preparing X the right way:
Instead of
X = np.matrix(list(zip(lat, long)))
I used
X = np.array([lat, long]).T
.T
is for transposing (instead of using zip()
) - then I got the right shape for X[:, 0]
and X[:, 1]
!