0

I have a dataset that all the values for each feature are numeric, even the class/label column. In boosting algorithms implemented in python (like logitboost, adaboost, gradientboosting), other than the preset base estimators (or weak learners, model that iterates through our data) we can specify a classification algorithm, like SVC from SVM, like naive bayes and so on. (some algorithms/packages like xgboost and catboost which are implemented in python, can't accept any other base estimator other than the ones implemented within the package... maybe a implementation preference?)

With this introduction, i present my problem. This code here doesn't work and gives this error: LogitBoost requires the base estimator to be a regressor.

for j in range(1, 21):
for i in range(1, 11):
    X_train = pd.read_csv('{}-train{}-1.csv'.format(j, i))
    y_train = pd.read_csv('{}-train{}-2.csv'.format(j, i))
    X_test = pd.read_csv('{}-test{}-1.csv'.format(j, i))
    y_test = pd.read_csv('{}-test{}-2.csv'.format(j, i))
    model = logitboost.LogitBoost(base_estimator = svm.SVC())
    model.fit(X_train, y_train)
    y_predict = model.predict(X_test)
    accuracy = metrics.accuracy_score(y_test, y_predict)
    print('Accuracy for dataset {}, segment {} is: '.format(j, i), accuracy)
    print('Confusion Matrix for Datatset {}, segment {} is: '.format(j, i))
    print(metrics.confusion_matrix(y_test, y_predict))

Please, don't mind the indention. It shows wrong here, but, it is correct on idle. I apologize for that.

The line:

model = logitboost.LogitBoost(base_estimator = svm.SVC())

doesn't work with SVC, which it should, because logitboost is a classification algorithm. But, it does work with SVR (support vector regression) which is not what i want. Can anyone explain why that happens and how i can solve it? I need to use same base estimators for each algorithm for the sake of fairness in comparison...

Thanks.

1 Answers1

0

This is how LogitBoost (and GBM) works. Each additional weak learner is fitting to the residual, and so needs to be a regressor. (The initial prediction in LogitBoost seems to just be 0.5 for all samples. I think other implementations may take the average response instead.)

From the documentation:

base_estimator:object, optional
The base estimator from which the LogitBoost classifier is built. This should be a regressor. If no base_estimator is specified, a decision stump is used.

Adaboost is different, in that it alters the weights of the misclassified samples, but each new weak learner is still fitting to the original data, and so are classifiers.

Ben Reiniger
  • 10,517
  • 3
  • 16
  • 29