0

I'm trying to implement a logistic regression with Sklearn. Currentely I have a Dataframe which consists of 12 input variables and 1 output variable.

The output dataframe is binary valued whereas the remaining 12 variables are not necessarily so.

Example how the input data is structured.

#PseudoCode (Y and X are pandas dataframes)
Y = 0, 1, 0, 1, 1, 1  # Output data
X =  A1: 1, 1, 2, 1, 2, 2 #Input Data
     B2: 45, 23, 12, 56, 23, 86
     ...
     L12: 4.2, 3.2, 1.2, 2.3, 2.3, 9.9

Then with that the following is done:

X = X.astype(int) # to make sure that the data is actually in int format.
Y = Y.astype(int)

X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size=.10, random_state = 42)

xscaler = StandardScaler()
yscaler = StandardScaler()

pipe = Pipeline([('scaler', xscaler), ('logit', LogisticRegression())]) 
model = TransformedTargetRegressor(regressor=pipe, transformer=yscaler)
model.fit(X_train,y_train)

This however, throws out the following:

ValueError: Unknown label type: 'continuous'

Why does this happen even though the Y data is clearly binary valued?

krpytix
  • 23
  • 4
  • might be [this](https://stackoverflow.com/questions/41925157/logisticregression-unknown-label-type-continuous-using-sklearn-in-python) – Panda Jul 14 '21 at 12:11

1 Answers1

1

The problem here is that you are scaling your labels y using a StandardScaler().

y is a categorical variable that is used to say that a sample belong to the class 1 or 0 and therefore it must not be scaled.

Antoine Dubuis
  • 4,974
  • 1
  • 15
  • 29