-2

I am a newbie in Data Science and Python. So I try to use KMeans from sklearn. I have information about calls, and I want to find centroids. So I can do it for one phone number, but can't for 10. When I used for-loop I got the mistake "could not convert string to float: 'cd9f3b1a-2eb8-4cdb-86d1-5d4c2740b1dc'".

For one phone number. It works.

df = pd.read_csv('Datasets/CDR.csv')
df.CallDate = pd.to_datetime(df.CallDate)
df.CallTime = pd.to_timedelta(df.CallTime)
df.Duration = pd.to_timedelta(df.Duration)

in_numbers = df.In.unique().tolist()
in_numbers

user1 = df[(df.In == in_numbers[0])]

user1 = user1[(user1.DOW == 'Sat') | (user1.DOW == 'Sun')]
user1 = user1[(user1.CallTime < "06:00:00") | (user1.CallTime > "22:00:00")]

fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(user1.TowerLon,user1.TowerLat, c='g', marker='o', alpha=0.2)
ax.set_title('Weekend Calls (<6am or >10p)')

user1 = pd.concat([user1.TowerLon, user1.TowerLat], axis = 1)

model = KMeans(n_clusters = 2)
labels = model.fit_predict(user1)

centroids = model.cluster_centers_
print(centroids)
ax.scatter(centroids[:,0], centroids[:,1], marker='x', c='red', alpha=0.5, linewidths=3, s=169)
plt.show()

But when I put it in loop I get the error.

locations = []

for i in range(10):
    user = df[(df.In == in_numbers[i])]
    user.plot.scatter(x='TowerLon', y='TowerLat', c='purple', alpha=0.1, title='Call Locations', s = 30)
    user = user[(user.DOW == 'Sat') | (user.DOW == 'Sun')]
    user = user[(user.CallTime < "06:00:00") | (user.CallTime > "22:00:00")]

    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(user.TowerLon,user.TowerLat, c='g', marker='o', alpha=0.2)
    ax.set_title('Weekend Calls (<6am or >10p)')

    model = KMeans(n_clusters = 2)
    labels = model.fit_predict(user)
    centroids = model.cluster_centers_
    ax.scatter(centroids[:,0], centroids[:,1], marker='x', c='red', alpha=0.5, linewidths=3, s=169)
    locations.append(centroids)
plt.show()

Where is my mistake? Thank you

CDR.csv

John
  • 1
  • 3
  • 1
    The exception should also tell you the line where that's happening. Try adding a `print` before that line to see the values of everything you're using. Right now, all we can tell is what the error message is saying - somewhere in that loop there's a value that's expected to be a number, but it's actually this random string that can't be converted to a float. Maybe `in_numbers` has a value that's not a number? try `print(in_numbers)` before the loop to verify that. – tomas May 31 '18 at 10:08
  • Otherwise, please upload the data file `Datasets/CDR.csv`, so people can try to reproduce and give you more specific help. – tomas May 31 '18 at 10:09
  • What is the question? Do you not understand why `'cd9f3b1a-2eb8-4cdb-86d1-5d4c2740b1dc'` cannot be converted to float? Or you don't understand why there is `'cd9f3b1a-2eb8-4cdb-86d1-5d4c2740b1dc'` in the first place? You have to first understand your data and then start writing code. – zvone May 31 '18 at 10:20
  • I think problem in this line _labels = model.fit_predict(user)_ – John May 31 '18 at 14:14

1 Answers1

0

I missed the line in loop

user = pd.concat([user.TowerLon, user.TowerLat], axis = 1)

Thanks for all

John
  • 1
  • 3