type wrong when using sklearn and pandas.Dataframe

Question

I want to use sklearn to do some predict and i stored my data in a Dataframe.

Data = DataFrame(columns = columns,index = range(1,501))

The data has no problem.

from sklearn.cross_validation import train_test_split
Xtrain,Xtest,Ytrain,Ytest = train_test_split(Data[columns[0:5]],Data[columns[5:6]],test_size = 0.25,random_state = 33)

aslo tried:

Xtrain,Xtest,Ytrain,Ytest = train_test_split(np.array(Data[columns[0:5]]),np.array(Data[columns[5:6]]),test_size = 0.25,random_state = 33)
from sklearn.linear_model import LogisticRegression
ss = StandardScaler()
Xtrain = ss.fit_transform(Xtrain)
Xtest = ss.transform(Xtest)
lr = LogisticRegression()
lr.fit(Xtrain,Ytrain)

and the wrong message is:

Traceback (most recent call last): 

    File "/Volumes/sogou_baidu.py", line 148, in <module> 
        lr.fit(Xtrain,Ytrain) 
    File "/Users/liumengyang/anaconda/lib/python3.5/site-packages/skl‌earn/linear_model/lo‌gistic.py", line 1143, in 
        fit check_classification_targets(y) 
    File "/Users/liumengyang/anaconda/lib/python3.5/site-packages/skl‌earn/utils/multiclas‌s.py", line 173, in 
        check_classification_targets raise ValueError("Unknown label type: %r" % y)

    ValueError: Unknown label type: array([-1, -1, 1, -1, 1, -1, 0, -1, 1, -2, 0, -1, 1, 1, 0, -1, 1, 0, 1, -1, 0, 0, 1, -1, -1, 0, -1, -1, -1, 0, -1, -1, 0, 1, 0, -1, 1], dtype=object)

Under normal circumstances , the parameters of lr.fit() should be two array , but now use the DataFrame as parameter , there is a redundant parameter “dtype=object” , how could i solve this problem ？

Print the whole stack trace of error. Also the `dtype=object` is not a parameter, it is just information about array on its left. — Vivek Kumar, Mar 01 '17 at 07:26
/Users/liumengyang/anaconda/lib/python3.5/site-packages/sklearn/utils/validation.py:515: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). ... ValueError: Unknown label type: array([-1, -1, 1, -1, 1, -1, 0, -1, 1, -2, 0, -1, 1, 1, 0, -1, 1, 0, 1, -1, 0, 0, 1, -1, -1, 0, -1, -1, -1, 0, -1, -1, 0, 1, 0, -1, 1], dtype=object) — Mengyang LIU, Mar 01 '17 at 08:02
This is just last line in stack trace. You should post complete one which starts from the line in your code and then goes all the way to error. — Vivek Kumar, Mar 01 '17 at 08:05
Traceback (most recent call last): File "", line 1, in runfile('/Volumes/sogou_baidu.py', wdir='/Volumes/predict') File "/Users/liumengyang/anaconda/lib/python3.5/site-packages/spyder/utils/site/sitecustomize.py", line 866, in runfile execfile(filename, namespace) File "/Users/liumengyang/anaconda/lib/python3.5/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace) — Mengyang LIU, Mar 01 '17 at 08:09
File "/Volumes/sogou_baidu.py", line 148, in lr.fit(Xtrain,Ytrain) File "/Users/liumengyang/anaconda/lib/python3.5/site-packages/sklearn/linear_model/logistic.py", line 1143, in fit check_classification_targets(y) File "/Users/liumengyang/anaconda/lib/python3.5/site-packages/sklearn/utils/multiclass.py", line 173, in check_classification_targets raise ValueError("Unknown label type: %r" % y) — Mengyang LIU, Mar 01 '17 at 08:10
ValueError: Unknown label type: array([-1, -1, 1, -1, 1, -1, 0, -1, 1, -2, 0, -1, 1, 1, 0, -1, 1, 0, 1, -1, 0, 0, 1, -1, -1, 0, -1, -1, -1, 0, -1, -1, 0, 1, 0, -1, 1], dtype=object) — Mengyang LIU, Mar 01 '17 at 08:10
Show some samples of y_train and which version of scikit are you using? — Vivek Kumar, Mar 01 '17 at 08:24
samples:R1 R2 R3 R4 R5 Y :-3 -1 0 0 0 -1; 0 -3 0 2 0 -1;0 1 1 2 -2 1...... — Mengyang LIU, Mar 01 '17 at 08:29

type wrong when using sklearn and pandas.Dataframe

0 Answers0