0

Today I've tried to test an amazing Catboost library published recently by Yandex but it shows very poor results even on a toy dataset. I've tried to find a root of my problem but due to the lack of proper documentation and topics about the library I can't figure out what's going on. Please help me =) I'm using Anaconda 3 x64 with Python 3.6.

from sklearn.datasets import make_classification
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, roc_curve, f1_score, make_scorer
from catboost import CatBoostClassifier

X,y = make_classification( n_classes=2
                              ,n_clusters_per_class=2
                              ,n_features=10
                              ,n_informative=4
                              ,n_repeated=2
                              ,shuffle=True
                              ,random_state=564
                              ,n_samples=10000
                                 )

X_train,X_test,y_train,y_test = train_test_split(X,y,train_size = 0.8)

cb = CatBoostClassifier(depth=3,custom_loss=
                            ['Accuracy','AUC'],
                            logging_level='Silent',
                            iterations=500,
                            od_type='Iter',
                            od_wait=20)
cb.fit(X_train,y_train,eval_set=(X_test,y_test),plot=True,use_best_model=True)
pred = cb.predict_proba(X_test)[:,1]
tpr,fpr,_=roc_curve(y_score=pred,y_true=y_test)
    #just to show the difference
from sklearn.ensemble import GradientBoostingClassifier
gbc = GradientBoostingClassifier().fit(X_train,y_train)
pred_gbc = gbc.predict_proba(X_test)[:,1]
tpr_xgb,fpr_xgb,_=roc_curve(y_score=pred_gbc,y_true=y_test)
plt.plot(tpr,fpr,color='orange')
plt.plot(tpr_xgb,fpr_xgb,color='red')
plt.show()
Sixiang.Hu
  • 1,009
  • 10
  • 21
  • What do you mean by "poor"? Which results you compared with? xgboost? – Sixiang.Hu Feb 05 '18 at 16:37
  • By poor I mean ROC AUC = 0.5. It was a bug in 0.6 version of CatBoost, 0.6.1 shows AUC higher then 0.8 (don't remember exact value) and beats default sklearn GradientBoostingClassifier. – machine_learner Feb 06 '18 at 17:47

1 Answers1

0

It was a bug. Be careful and ensure you are using the latest version. The bug was fixed in 0.6.1 version.