I tried using pycaret for a machine learning project and got very high accuracies. When I tried to verify these using my sklearn code I found that I could not get the same numbers. Here is an example where I reproduce this issue on the public poker dataset from pycaret:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pycaret.classification import *
from pycaret.datasets import get_data
data = get_data('poker')
grid = setup(data=data, target='CLASS', fold_shuffle=True, session_id=2)
dt = create_model('dt')
This gives an accuracy using 10-fold cross validation of about 57%. When I try to reproduce this number using sklearn on the same dataset with the same model I get only 49%. Does anyone understand where this difference comes from??
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import accuracy_score
X = data.drop('CLASS', axis = 1)
y = data['CLASS']
y_pred_cv = cross_val_predict(dt, X, y, cv=10)
accuracy_score(y, y_pred_cv)
0.4911698233964679