I want to use sklearn to do some predict and i stored my data in a Dataframe.
Data = DataFrame(columns = columns,index = range(1,501))
The data has no problem.
from sklearn.cross_validation import train_test_split
Xtrain,Xtest,Ytrain,Ytest = train_test_split(Data[columns[0:5]],Data[columns[5:6]],test_size = 0.25,random_state = 33)
aslo tried:
Xtrain,Xtest,Ytrain,Ytest = train_test_split(np.array(Data[columns[0:5]]),np.array(Data[columns[5:6]]),test_size = 0.25,random_state = 33)
from sklearn.linear_model import LogisticRegression
ss = StandardScaler()
Xtrain = ss.fit_transform(Xtrain)
Xtest = ss.transform(Xtest)
lr = LogisticRegression()
lr.fit(Xtrain,Ytrain)
and the wrong message is:
Traceback (most recent call last):
File "/Volumes/sogou_baidu.py", line 148, in <module>
lr.fit(Xtrain,Ytrain)
File "/Users/liumengyang/anaconda/lib/python3.5/site-packages/sklearn/linear_model/logistic.py", line 1143, in
fit check_classification_targets(y)
File "/Users/liumengyang/anaconda/lib/python3.5/site-packages/sklearn/utils/multiclass.py", line 173, in
check_classification_targets raise ValueError("Unknown label type: %r" % y)
ValueError: Unknown label type: array([-1, -1, 1, -1, 1, -1, 0, -1, 1, -2, 0, -1, 1, 1, 0, -1, 1, 0, 1, -1, 0, 0, 1, -1, -1, 0, -1, -1, -1, 0, -1, -1, 0, 1, 0, -1, 1], dtype=object)
Under normal circumstances , the parameters of lr.fit() should be two array , but now use the DataFrame as parameter , there is a redundant parameter “dtype=object” , how could i solve this problem ?