Sklearn can't convert string to float

Question

I'm using Sklearn as a machine learning tool, but every time I run my code, it gives this error:

Traceback (most recent call last):
  File "C:\Users\FakeUserMadeUp\Desktop\Python\Machine Learning\MachineLearning.py", line 12, in <module>
    model.fit(X_train, Y_train)
  File "C:\Users\FakeUserMadeUp\AppData\Roaming\Python\Python37\site-packages\sklearn\tree\_classes.py", line 942, in fit
    X_idx_sorted=X_idx_sorted,
  File "C:\Users\FakeUserMadeUp\AppData\Roaming\Python\Python37\site-packages\sklearn\tree\_classes.py", line 166, in fit
    X, y, validate_separately=(check_X_params, check_y_params)
  File "C:\Users\FakeUserMadeUp\AppData\Roaming\Python\Python37\site-packages\sklearn\base.py", line 578, in _validate_data
    X = check_array(X, **check_X_params)
  File "C:\Users\FakeUserMadeUp\AppData\Roaming\Python\Python37\site-packages\sklearn\utils\validation.py", line 746, in check_array
    array = np.asarray(array, order=order, dtype=dtype)
  File "C:\Users\FakeUserMadeUp\AppData\Roaming\Python\Python37\site-packages\pandas\core\generic.py", line 1993, in __ array __
    return np.asarray(self._values, dtype=dtype)
ValueError: could not convert string to float: 'Paris'

Here is the code, and down below there's my dataset:

(I've tried multiple different datasets, also, this dataset is a txt because I made it myself and am to dumb to convert it to csv.)

    import pandas as pd
    from sklearn.tree import DecisionTreeClassifier as dtc
    from sklearn.model_selection import train_test_split as tts

    city_data = pd.read_csv('TimeZoneTable.txt')
    X = city_data.drop(columns=['Country'])
    Y = city_data['Country']

    X_train, X_test, Y_train, Y_test = tts(X, Y, test_size = 0.2)

    model = dtc()
    model.fit(X_train, Y_train)
    predictions = model.predict(X_test)

    print(Y_test)
    print(predictions)

Dataset:

CityName,Country,Latitude,Longitude,TimeZone

Moscow,Russia,55.45'N,37.37'E,3

Vienna,Austria,48.13'N,16.22'E,2

Barcelona,Spain,41.23'N,2.11'E,2

Madrid,Spain,40.25'N,3.42'W,2

Lisbon,Portugal,38.44'N,9.09'W,1

London,UK,51.30'N,0.08'W,1

Cardiff,UK,51.29'N,3.11'W,1

Edinburgh,UK,55.57'N,3.11'W,1

Dublin,Ireland,53.21'N,6.16'W,1

Paris,France,48.51'N,2.21'E,2

Does this answer your question? [RandomForestClassfier.fit(): ValueError: could not convert string to float](https://stackoverflow.com/questions/30384995/randomforestclassfier-fit-valueerror-could-not-convert-string-to-float) — Dilara Gokay, May 08 '22 at 12:03
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, May 08 '22 at 13:37

score 0 · Answer 1 · answered May 13 '22 at 14:38

Machine learning algorithms and in particular the random forest work exclusively with input numbers. If you want to improve your model it is even recommended to normalize your model between -1;1 in general and therefore to use decimal numbers, hence the expectation of a float.

In your case, your dataframe seems to contain exclusively string entries. As Dilara Gokay said, you first need to transform your strings into floats and to do so, use what is called an onehotencoder. I let you follow this tutorial if you don't know how to do it.

Sklearn can't convert string to float

1 Answers1