1

I have successfully built logistic regression model using train dataset below.

X = train.drop('y', axis=1)
y = train['y']

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.5)

scaler = StandardScaler()  
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

logreg1 = LogisticRegression()
logreg1.fit(X_train, y_train)

score = logreg1.score(X_test, y_test)
cvs = cross_val_score(logreg1, X_test, y_test, cv=5).mean()

My problem is I want to bring in the test dataset to predict the unknown y value. In the test data theres no y column. How can I predict the y value using the seperate test dataset??

Jake Park
  • 91
  • 1
  • 10
  • 3
    The purpose of the test set is to *test* the training. If you don't have those data labeled in the same form as the training data, then it's not a *test* set. – Prune Oct 29 '18 at 23:21
  • 1
    you might have a misconception between validation set, train set and test set, check any introductory regression tutorial on the net. – Curcuma_ Oct 29 '18 at 23:23
  • If you want to see the predicted values of your fitted model with any new input, use the `predict()` method of the LogisticRegression object, passing in something like `X_test`. – Mihai Chelaru Oct 29 '18 at 23:24
  • You know how like in kaggle they give you tain and test set? I have successfully trained my training set above but I want to predict the y values using the model that ive built from the training dataset. in the test data all data are same but theres no target column. How can I use the test datatset to predict the unknown y? – Jake Park Oct 29 '18 at 23:44

1 Answers1

0

Use predict():

y_pred = logreg1.predict(X_test)
score = logreg1.score(X_test, y_pred)
print(y_pred)     // see the predictions
druskacik
  • 2,176
  • 2
  • 13
  • 26