0

I have a problem that produce different training score when using pipeline and manual.

MANUAL :

#standardize data    
sc=StandardScaler()
X_train[['age','balance','duration']] = sc.fit_transform(X_train[['age','balance','duration']])
X_test[['age','balance','duration']] = sc.transform(X_test[['age','balance','duration']])

#applying SMOTE
X_oversampling , y_oversampling = over_sampling.SMOTE(random_state=42).fit_resample(X_train,y_train)

#modelling
model_lr = LogisticRegression()
model_lr.fit(X_oversampling, y_oversampling)  


#evaluation
y_pred = model_lr.predict(X_test)
y_pred_train = model_lr.predict(X_oversampling)
print(f'Train Accuracy Score : {round(accuracy_score(y_oversampling,y_pred_train),4)}')
print(f'Test Accuracy Score : {round(accuracy_score(y_test,y_pred),4)}')

#result
Train Accuracy Score : 0.835
Test Accuracy Score : 0.82

WITH PIPELINE :

pipeline_logreg = Pipeline([('sampling', over_sampling.SMOTE(random_state=42)),
                        ('logreg', LogisticRegression())])
pipeline_logreg.fit(X_train,y_train)

**the reason i dont include standard scaler in my pipeline because i've already done it 
  manually from the code above (at #standardize data code)

#evaluation
y_pred = pipeline_logreg.predict(X_test)
y_pred_train = pipeline_logreg.predict(X_train
print(f'Train Accuracy Score : {round(accuracy_score(y_train,y_pred_train),4)}')
print(f'Test Accuracy Score : {round(accuracy_score(y_test,y_pred),4)}')

#result
Train Accuracy Score : 0.8261
Test Accuracy Score : 0.82

So why the result is different on training accuracy? The test accuracy score was the same.

new_data
  • 11
  • 2

0 Answers0