MeanShift Clustering from csv file

Question

I have an csv file called age-average. It has 4 column userId,average,age,count. I want to take average and age to cluster my users. Here is my code:

import csv
import pandas as pd
import numpy as np
from sklearn.cluster import MeanShift
import matplotlib.pyplot as plt

df = pd.read_csv('/age-average.csv')
csv_file = open('/age-average.csv')
csv_reader = csv.reader(csv_file, delimiter=',')
next(csv_reader)

for row in csv_reader:
    userID,average,age,count = row
    plt.scatter(age,average)

plt.show()

It shows the graph.So far everythink is ok. However when I must to use ms.fit() function I always get an error:

ms =  MeanShift()
ms.fit(df[['average','age']])
labels = ms.labels
cluster_centers = ms.cluster_centers_

print(cluster_centers)

n_clusters_ = len(np.unique(labels))

print("The number of estimated clusters ", n_clusters_)

colors = 10*['r.','g.','c.','k.','y.','m.']

print(colors)

for i in range(len(age,average)):
    plt.plot(int(age[i]), float(average[i]), colors[labels[i]], markersize = 10)
plt.scatter(cluster_centers[:,0],cluster_centers[:,1],
            marker="x", s= 150, linewidths = 5, zorder=10)
plt.show()

What should I write instead of df[['average','age']] inside of the ms.fit(). Can anybody have an idea? I get confused. Thanks!

I got an error like

Traceback (most recent call last): File "example.py", line 20, in labels = ms.labels AttributeError: 'MeanShift' object has no attribute 'labels'

maybe df[['average','age']].values? so you pass a numpy array instead a pandas object? Tough to say without seeing the error. — Erotemic, Dec 14 '16 at 23:13
I edit my question I got an error like 'MeanShift' object has no attribute 'labels' when I try to use df[['average','age']].values as well — berkt, Dec 14 '16 at 23:21
ms.fit takes two parameters: the list of features and the list of labels. You need to separate your data into a feature set (X) and a target label set (y). — Erotemic, Dec 15 '16 at 03:30
I solved by convert df file to an np array by this on line code `numpyMatrix = df.as_matrix()` And put numpyMatrix on ms.fit like `ms.fit(numpyMatrix)` — berkt, Dec 15 '16 at 15:37

MeanShift Clustering from csv file

0 Answers0