0

I'm working on a scatter-plot of clusters.
With sklearn.cluster.KMeans I get a list the same length as my 2D-matrix X...

Running

k = 5
df = pd.read_csv('data_latlong.csv')
lat = df['Lat'].values
long = df['Long'].values

X = np.matrix(list(zip(lat, long)))
kmeans = KMeans(n_clusters=k).fit(X)

plt.figure(figsize=(10, 10))
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_)
plt.title("n = 5")
plt.show()

Gives me the following ValueError:
ValueError: 'c' argument has 3909 elements, which is not acceptable for use with 'x' with size 3909, 'y' with size 3909.

Any suggestions how to handle that?
Thanks!

Solution - preparing X the right way:
Instead of
X = np.matrix(list(zip(lat, long)))
I used
X = np.array([lat, long]).T

.T is for transposing (instead of using zip()) - then I got the right shape for X[:, 0] and X[:, 1]!

lorny
  • 5
  • 2

1 Answers1

0

TL;DR: try c=kmeans.labels_.reshape(kmeans.labels_.shape[0]). This will convert the labels from a (3909,1) array to a (3909,) vector.

What you've done works for me with the sklearn "iris" dataset:

from sklearn import datasets
from sklearn.cluster import KMeans
from matplotlib import pyplot as plt

irises = datasets.load_iris()
X = irises['data']
clust = KMeans(n_clusters=3).fit(X)

plt.figure(figsize=(10, 10))
plt.scatter(X[:, 0], X[:, 1], c=clust.labels_)

print(X.shape)             # (150, 4)
print(clust.labels_.shape) # (150,)

enter image description here

Note that my clust.labels_ is a (150,) vector. If instead I reshape it to a (150,1) and try to pass that, I get the same error you do:

c_bad = clust.labels_.reshape((150,1))
plt.scatter(X[:, 0], X[:, 1], c=c_bad) # fails

So I think we have different versions of sklearn, where my labels are put in a vector whereas yours are put in an array. The solution would be to go the opposite way and try to reshape your labels from an array to a vector:

plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_.reshape(3909))

Also see

butterflyknife
  • 1,438
  • 8
  • 17
  • 1
    Thanks a lot! I got version 0.21.3 of sklearn and the labels are in a vector too. But my fault was the preparing of X :S X[:, 0] and X[:, 1] got both the shape (3909,1)... – lorny Dec 03 '19 at 13:30