1

I have a question related to Logistic Regression where I am getting ValueError

Here's my dataset:

             sub1         sub2              sub3       sub4
pol_1     0.000000      0.000000            0.0      0.000000   
pol_2     0.000000      0.000000            0.0      0.000000   
pol_3     0.050000      0.000000            0.0      0.000000   
pol_4     0.000000      0.000000            0.0      0.000000   
pol_5     0.000000      0.000000            0.0      0.000000   
pol_6     0.000000      0.000000            0.0      0.000000   
pol_7     0.000000      0.000000            0.0      0.000000   
pol_8     0.000000      0.000000            0.0      0.000000   
pol_9     0.000000      0.000000            0.0      0.000000   
pol_10    0.000000      0.000000            0.0      0.032423   
pol_11    0.000000      0.000000            0.0      0.000000   
pol_12    0.000000      0.000000            0.0      0.000000   
pol_13    0.000000      0.000000            0.0      0.000000   
pol_14    0.000000      0.053543            0.0      0.000000   
pol_15    0.000000      0.000000            0.0      0.000000   
pol_16    0.000000      0.000000            0.0      0.000000   
pol_17    0.000000      0.000000            0.0      0.000000   
pol_18    0.000000      0.000000            0.0      0.053453   
pol_19    0.000000      0.058344            0.0      0.000000   
pol_20    0.054677      0.000000            0.0      0.000000

This is my code:

array = df.values
X = array[:,0:3]
Y = array[:,3]
validation_size = 0.20
seed = 7
X_train, X_validation, Y_train, Y_validation = 
model_selection.train_test_split(X, Y, test_size=validation_size, 
random_state=seed)

seed = 7
scoring = 'accuracy'

kfold = model_selection.KFold(n_splits=10, random_state=seed)
cv_results = model_selection.cross_val_score(LogisticRegression(), X_train, Y_train, cv=kfold, scoring=scoring)
print(cv_results)

This gives me the following error:

ValueError: Unknown label type: 'continuous'

How can this issue be tackled?

Also, I looked through certain links and found that the issue could be related to datatype which in my case is:

print(df.dtypes)
print(X_train.dtype)

pol_1     float64
pol_2     float64
pol_3     float64
pol_4     float64
pol_5     float64
pol_6     float64
pol_7     float64
pol_8     float64
pol_9     float64
pol_10    float64
pol_11    float64
pol_12    float64
pol_13    float64
pol_14    float64
pol_15    float64
pol_16    float64
pol_17    float64
pol_18    float64
pol_19    float64
pol_20    float64
Length: 20, dtype: object
float64

I tried to convert the datatype for X_train and Y_train to string but got the same error.

Thanks!

Miriam Farber
  • 18,986
  • 14
  • 61
  • 76
Mooni
  • 121
  • 12
  • LogisticRegression is actually a classifier. Are you solving a classification problem or a regression problem? Which value are you trying to predict? Are the prediction related to some fixed classes (i.e the data must belong to one of these classes) or you want to predict a number (like stock price, rain fall, salary etc)?? – Vivek Kumar Jul 10 '17 at 06:06

1 Answers1

1

The type of Y should be int. That is, it should consist of integers that represent the class labels. However, in your data frame the Y column consists of floats, and hence you get this error.

Miriam Farber
  • 18,986
  • 14
  • 61
  • 76