I am currently using daily financial data to fit my SVM and AdaBoost. To check my result, I tried AdaBoost with n_estimators=1 so that it would return same result as I just run a single SVM.
from sklearn.ensemble import AdaBoostClassifier
from sklearn.svm import SVC
svm2 = SVC(C=box_const, kernel='rbf', degree=3, gamma='scale', coef0=0.0,
shrinking=True, tol=0.001, cache_size=1000, class_weight='balanced',
verbose=False, max_iter=-1, decision_function_shape='ovr', probability=True)
model2 = AdaBoostClassifier(base_estimator=svm2,
n_estimators=1,
algorithm='SAMME.R')
model2.fit(X_train, y_train)
svm2.fit(X_train, y_train)
However, on the contrary, I found that even though I set n_estimators=1, they produced different prediction result. Have I done something wrong? Or is there a specific reason to this result?
>>> model2.predict(X_test)
array([1., 1., 1., 1., 1.])
>>> model2.base_estimator
SVC(C=1, cache_size=1000, class_weight='balanced', probability=True)
>>> svm2.predict(X_test)
array([0., 1., 1., 0., 0.])
>>> svm2
SVC(C=1, cache_size=1000, class_weight='balanced', probability=True)
[Edit] I've found out that there is a significant difference regarding the ways I add sample_weight to the scikit learn's SVC.
When I define my model as such
svm2 = SVC(C=box_const, kernel='rbf', degree=3, gamma='scale', coef0=0.0,
shrinking=True, tol=0.001, cache_size=1000, class_weight='balanced',
verbose=False, max_iter=-1, decision_function_shape='ovr', probability=True)
These two yields the same prediction results
svm2.fit(X, y, sample_weight=[1] * len(X))
svm2.fit(X, y)
while
svm2.fit(X, y, sample_weight=[1 / len(X)] * len(X))
yields different results. I believe that since AdaBoost initializes the sample weights with 1 / len(X), this kind of problem occurs. Have I done something wrong in inserting sample weights to SVM?