Test Train Split : error

Question

how can i split my df :

X=Final_df.drop('survived',axis=1)
Y=Final_df['survived']


X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=123    )
logreg=LogisticRegression()
logreg.fit(X_train,Y_train)
train,test = train_test_split(Final_df, test_size=0.2)
Y_pred=logreg.predict(Y_test)

IM GETTING an error like:

ValueError                                Traceback (most recent call last)
<ipython-input-38-f81a6db0e9ae> in <module>()
----> 1 Y_pred=logreg.predict(Y_test)

~\Anaconda3\lib\site-packages\sklearn\linear_model\base.py in predict(self, X)
    322             Predicted class label per sample.
    323         """
--> 324         scores = self.decision_function(X)
    325         if len(scores.shape) == 1:
    326             indices = (scores > 0).astype(np.int)

~\Anaconda3\lib\site-packages\sklearn\linear_model\base.py in decision_function(self, X)
    298                                  "yet" % {'name': type(self).__name__})
    299 
--> 300         X = check_array(X, accept_sparse='csr')
    301 
    302         n_features = self.coef_.shape[1]

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    439                     "Reshape your data either using array.reshape(-1, 1) if "
    440                     "your data has a single feature or array.reshape(1, -1) "
--> 441                     "if it contains a single sample.".format(array))
    442             array = np.atleast_2d(array)
    443             # To ensure that array flags are maintained

ValueError: Expected 2D array, got 1D array instead:
array=[0 0 0 0 0 0 0 1 0 1 0 1 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 1 1 1
 0 0 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0
 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 0
 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1
 1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1
 1 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 1 0 1 0 0 1 0 0 1 0 1 0 1 0 1 1 1
 0 0 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 0
 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0
 1 1 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1
 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 1 1 0 1 0 0 0 0 1 0 1 0 1 1 1 0 1 0 1 0 0
 1 0 1 0 1 1 0 1].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Please provide a [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) — U13-Forward, Jul 21 '18 at 06:18

Yannis · Answer 1 · 2018-07-21T11:32:23.333

3

You need to use the X_test for the prediction not the Y_test. X stores the independent variables (what you use for prediction) and Y the dependent variable (what you need to predict).

Thus, your last line should be:

Y_pred=logreg.predict(X_test)

edited Jul 21 '18 at 11:32

answered Jul 21 '18 at 10:04

Yannis

683
7
16

score 0 · Answer 2 · answered Jul 21 '18 at 16:23

The model should predict the X_test and not the Y_test.

Use this:

X=Final_df.drop('survived',axis=1)
Y=Final_df['survived']


X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=123    )
logreg=LogisticRegression()
logreg.fit(X_train,Y_train)
train,test = train_test_split(Final_df, test_size=0.2)

# Here is the change
Y_pred=logreg.predict(X_test)

Test Train Split : error

2 Answers2