I have a problem that produce different training score when using pipeline and manual.
MANUAL :
#standardize data
sc=StandardScaler()
X_train[['age','balance','duration']] = sc.fit_transform(X_train[['age','balance','duration']])
X_test[['age','balance','duration']] = sc.transform(X_test[['age','balance','duration']])
#applying SMOTE
X_oversampling , y_oversampling = over_sampling.SMOTE(random_state=42).fit_resample(X_train,y_train)
#modelling
model_lr = LogisticRegression()
model_lr.fit(X_oversampling, y_oversampling)
#evaluation
y_pred = model_lr.predict(X_test)
y_pred_train = model_lr.predict(X_oversampling)
print(f'Train Accuracy Score : {round(accuracy_score(y_oversampling,y_pred_train),4)}')
print(f'Test Accuracy Score : {round(accuracy_score(y_test,y_pred),4)}')
#result
Train Accuracy Score : 0.835
Test Accuracy Score : 0.82
WITH PIPELINE :
pipeline_logreg = Pipeline([('sampling', over_sampling.SMOTE(random_state=42)),
('logreg', LogisticRegression())])
pipeline_logreg.fit(X_train,y_train)
**the reason i dont include standard scaler in my pipeline because i've already done it
manually from the code above (at #standardize data code)
#evaluation
y_pred = pipeline_logreg.predict(X_test)
y_pred_train = pipeline_logreg.predict(X_train
print(f'Train Accuracy Score : {round(accuracy_score(y_train,y_pred_train),4)}')
print(f'Test Accuracy Score : {round(accuracy_score(y_test,y_pred),4)}')
#result
Train Accuracy Score : 0.8261
Test Accuracy Score : 0.82
So why the result is different on training accuracy? The test accuracy score was the same.