I have this data:
print training_data
print labels
# prints
[[1, 0, 1, 1], [1, 1, 1, 1], [1, 0, 1, 1], [1, 1, 1, 0], [1, 1, 0, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 0,0], [1, 1, 1, 1], [1, 0, 1, 1]]
['a', 'b', 'a', 'b', 'a', 'b', 'b', 'a', 'a', 'a', 'b']
And am trying to feed it to a RandomForestClassifier from the sklearn python library.
classifier = RandomForestClassifier(n_estimators=10)
classifier.fit(training_data, labels)
But receive this error:
Traceback (most recent call last):
File "learn.py", line 52, in <module>
main()
File "learn.py", line 48, in main
classifier = train_classifier()
File "learn.py", line 33, in train_classifier
classifier.fit(training_data, labels)
File "/Library/Python/2.7/site-packages/scikit_learn-0.14_git-py2.7-macosx-10.8-intel.egg/sklearn/ensemble/forest.py", line 348, in fit
y = np.ascontiguousarray(y, dtype=DOUBLE)
File "/Library/Python/2.7/site-packages/numpy-1.8.0.dev_bbcfcf6_20130307-py2.7-macosx-10.8-intel.egg/numpy/core/numeric.py", line 419, in ascontiguousarray
return array(a, dtype, copy=False, order='C', ndmin=1)
ValueError: could not convert string to float: a
My guess is that I am not formatting this data correctly for fitting. But I do not understand why from the documentation
This seems like a pretty basic, simple issue. Anyone know the answer?