I am a newbie in Data Science and Python. So I try to use KMeans from sklearn. I have information about calls, and I want to find centroids. So I can do it for one phone number, but can't for 10. When I used for-loop I got the mistake "could not convert string to float: 'cd9f3b1a-2eb8-4cdb-86d1-5d4c2740b1dc'".
For one phone number. It works.
df = pd.read_csv('Datasets/CDR.csv')
df.CallDate = pd.to_datetime(df.CallDate)
df.CallTime = pd.to_timedelta(df.CallTime)
df.Duration = pd.to_timedelta(df.Duration)
in_numbers = df.In.unique().tolist()
in_numbers
user1 = df[(df.In == in_numbers[0])]
user1 = user1[(user1.DOW == 'Sat') | (user1.DOW == 'Sun')]
user1 = user1[(user1.CallTime < "06:00:00") | (user1.CallTime > "22:00:00")]
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(user1.TowerLon,user1.TowerLat, c='g', marker='o', alpha=0.2)
ax.set_title('Weekend Calls (<6am or >10p)')
user1 = pd.concat([user1.TowerLon, user1.TowerLat], axis = 1)
model = KMeans(n_clusters = 2)
labels = model.fit_predict(user1)
centroids = model.cluster_centers_
print(centroids)
ax.scatter(centroids[:,0], centroids[:,1], marker='x', c='red', alpha=0.5, linewidths=3, s=169)
plt.show()
But when I put it in loop I get the error.
locations = []
for i in range(10):
user = df[(df.In == in_numbers[i])]
user.plot.scatter(x='TowerLon', y='TowerLat', c='purple', alpha=0.1, title='Call Locations', s = 30)
user = user[(user.DOW == 'Sat') | (user.DOW == 'Sun')]
user = user[(user.CallTime < "06:00:00") | (user.CallTime > "22:00:00")]
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(user.TowerLon,user.TowerLat, c='g', marker='o', alpha=0.2)
ax.set_title('Weekend Calls (<6am or >10p)')
model = KMeans(n_clusters = 2)
labels = model.fit_predict(user)
centroids = model.cluster_centers_
ax.scatter(centroids[:,0], centroids[:,1], marker='x', c='red', alpha=0.5, linewidths=3, s=169)
locations.append(centroids)
plt.show()
Where is my mistake? Thank you