I have a dataset that all the values for each feature are numeric, even the class/label column. In boosting algorithms implemented in python (like logitboost, adaboost, gradientboosting), other than the preset base estimators (or weak learners, model that iterates through our data) we can specify a classification algorithm, like SVC from SVM, like naive bayes and so on. (some algorithms/packages like xgboost and catboost which are implemented in python, can't accept any other base estimator other than the ones implemented within the package... maybe a implementation preference?)
With this introduction, i present my problem. This code here doesn't work and gives this error: LogitBoost requires the base estimator to be a regressor.
for j in range(1, 21):
for i in range(1, 11):
X_train = pd.read_csv('{}-train{}-1.csv'.format(j, i))
y_train = pd.read_csv('{}-train{}-2.csv'.format(j, i))
X_test = pd.read_csv('{}-test{}-1.csv'.format(j, i))
y_test = pd.read_csv('{}-test{}-2.csv'.format(j, i))
model = logitboost.LogitBoost(base_estimator = svm.SVC())
model.fit(X_train, y_train)
y_predict = model.predict(X_test)
accuracy = metrics.accuracy_score(y_test, y_predict)
print('Accuracy for dataset {}, segment {} is: '.format(j, i), accuracy)
print('Confusion Matrix for Datatset {}, segment {} is: '.format(j, i))
print(metrics.confusion_matrix(y_test, y_predict))
Please, don't mind the indention. It shows wrong here, but, it is correct on idle. I apologize for that.
The line:
model = logitboost.LogitBoost(base_estimator = svm.SVC())
doesn't work with SVC, which it should, because logitboost is a classification algorithm. But, it does work with SVR (support vector regression) which is not what i want. Can anyone explain why that happens and how i can solve it? I need to use same base estimators for each algorithm for the sake of fairness in comparison...
Thanks.