I have a pandas data frame that looks like this:
corpus tfidf labels
0 dfnkdfnkf asdfhedfh ajdladja [0.0, 0.0, 0.0, 0.01, 0.8] 60
1 dfnkdfnkf asdfhedfh ajdladja [0.0, 0.0, 0.0, 0.01, 0.8] 73
2 dfnkdfnkf asdfhedfh ajdladja [0.0, 0.0, 0.0, 0.01, 0.8] 61
my desired output is this:
corpus tfidf labels
0 dfnkdfnkf asdfhedfh ajdladja 0.0, 0.0, 0.0, 0.01, 0.8 60
1 dfnkdfnkf asdfhedfh ajdladja 0.0, 0.0, 0.0, 0.01, 0.8 73
2 dfnkdfnkf asdfhedfh ajdladja 0.0, 0.0, 0.0, 0.01, 0.8 61
I want to unlist the column tfidf in order to create a numpy array to train a decision tree classifier.
x= df['tfidf'].values
y= df['labels'].values
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size=
0.25, random_state=0)
from sklearn.tree import DecisionTreeClassifier
classifier= DecisionTreeClassifier(criterion='entropy',
random_state=0)
classifier.fit(x_train, y_train)
When I tried the code above I got an error:
TypeError Traceback (most recent
call last)
TypeError: float() argument must be a string or a number, not 'list'
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent
call last)
<ipython-input-103-8aa769130bba> in <module>()
1 from sklearn.tree import DecisionTreeClassifier
2 classifier= DecisionTreeClassifier(criterion='entropy',
random_state=0)
----> 3 classifier.fit(x_train, y_train)enter code here
What can I do to get the data frame ready for training?