25

I have a logistic regression and a random forest and I'd like to combine them (ensemble) for the final classification probability calculation by taking an average.

Is there a built-in way to do this in sci-kit learn? Some way where I can use the ensemble of the two as a classifier itself? Or would I need to roll my own classifier?

user1507844
  • 5,973
  • 10
  • 38
  • 55
  • You need to roll your own, there's no way to combine two arbitrary classifiers. – Matti Lyra Feb 03 '14 at 16:32
  • 2
    There are several ongoing PRs and open issues on the sklearn github which are working towards having ensemble meta-estimators. Unfortunately none of them have been merged. – Daniel Feb 04 '14 at 03:47
  • @user1507844 could you take a stab at a similar question here ? http://stackoverflow.com/questions/23645837/learning-an-ensemble-model-for-muliple-runs-of-logistic-regression-on-very-lar – ekta May 14 '14 at 05:00

4 Answers4

34

NOTE: The scikit-learn Voting Classifier is probably the best way to do this now


OLD ANSWER:

For what it's worth I ended up doing this as follows:

class EnsembleClassifier(BaseEstimator, ClassifierMixin):
    def __init__(self, classifiers=None):
        self.classifiers = classifiers

    def fit(self, X, y):
        for classifier in self.classifiers:
            classifier.fit(X, y)

    def predict_proba(self, X):
        self.predictions_ = list()
        for classifier in self.classifiers:
            self.predictions_.append(classifier.predict_proba(X))
        return np.mean(self.predictions_, axis=0)
user1507844
  • 5,973
  • 10
  • 38
  • 55
  • 5
    Did you consider calibrating your estimators before averaging their prediction distributions? http://scikit-learn.org/stable/modules/calibration.html – trianta2 Apr 22 '15 at 13:13
  • Haven't tried that yet as it only came out in 0.16 but plan to try soon – user1507844 Apr 22 '15 at 17:02
  • I've tried calibrating, but at least for my specific problem, it actually made things worse... – user1507844 May 08 '15 at 05:32
  • 4
    @user1507844 You're probably getting worse performance because you're equally weighting all the classifiers' predictions. A better approach may be to try to minimize your loss function with a weight vector when combining the predictions. Look at the code here after line 50: https://www.kaggle.com/hsperr/otto-group-product-classification-challenge/finding-ensamble-weights You could even optimize the hyperparameters of your individual classifiers using a package like http://hyperopt.github.io/hyperopt/ – Ryan Jun 29 '15 at 07:45
  • @Ryan that example code is not very useful. Mostly because algos has different ratio of train vs valid samples. For example, random forest can easily fit 100% of train data, and logistic regression could fit only 70%. On validation datasets they could give similar results, but the algo from the link above will greatly overwieght RF over LR – Marat Zaynutdinoff Jul 10 '15 at 17:02
  • Commenting on old question, do not take the mean of proba, but rather take the mean of class prediction. – Merlin May 20 '16 at 03:21
  • @Merlin, Could you please explain, why taking the mean class prediction is better? – Temak Apr 23 '17 at 13:43
  • @Merlin Also, can you really take the mean of class prediction? Do you mean take the mode (most common predicted class)? And what if the probability is what you really care about – then you have to look at the probabilities, right? – user1507844 Apr 23 '17 at 21:25
5

Given the same problem, I used a majority voting method. Combing probabilities/scores arbitrarily is very problematic, in that the performance of your different classifiers can be different, (For example, an SVM with 2 different kernels , + a Random forest + another classifier trained on a different training set).

One possible method to "weigh" the different classifiers, might be to use their Jaccard score as a "weight". (But be warned, as I understand it, the different scores are not "all made equal", I know that a Gradient Boosting classifier I have in my ensemble gives all its scores as 0.97, 0.98, 1.00 or 0.41/0 . I.E. it's very overconfident..)

Community
  • 1
  • 1
GrimSqueaker
  • 412
  • 5
  • 17
  • 4
    Majority voting is fine for predicting which class an observation is in, but what if I want to know the probability of it being in that class? I'm fitting my individual classifiers to minimize log loss which I think avoids the "overconfidence" problem you describe. – user1507844 Mar 02 '14 at 16:18
  • 1
    The problem is with varying levels of performance by different predictors mainly. – GrimSqueaker Mar 02 '14 at 16:33
  • 1
    I'm no expert but perhaps there is a way to weight the different predictors based on their performance. Is that what the Jaccard score you mention does? – user1507844 Mar 02 '14 at 20:03
  • The Jaccard score is a statistical score/performance metric. Like Accuracy, precision, recall, etc'. (Jaccard similarity coefficient score ) . – GrimSqueaker Mar 03 '14 at 17:52
  • 2
    @user1507844: yes and (using **stacking**) those weights can be learned from a second-stage classifier (typically logistic regression, but could also be weighted averaging); moreover logistic regression gives more power than fixed weights; we can implicitly learn the specific cases where each classifier is good and bad. We train the level-2 classifier using both features + results from level-1 classifiers. Indeed you could even create level-2 (meta)features. – smci Aug 18 '15 at 20:59
5

What about the sklearn.ensemble.VotingClassifier?

http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html#sklearn.ensemble.VotingClassifier

Per the description:

The idea behind the voting classifier implementation is to combine conceptually different machine learning classifiers and use a majority vote or the average predicted probabilities (soft vote) to predict the class labels. Such a classifier can be useful for a set of equally well performing model in order to balance out their individual weaknesses.

Gabriel
  • 3,737
  • 11
  • 30
  • 48
  • That didn't exist when I originally posted this question, but it is the proper sklearn implementation of my code I think. Great to see it in there now! – user1507844 Dec 14 '16 at 01:26
  • 1
    Excellent. I was wondering though after looking at it, if it would be possible to have differentent features for each classifier... – Gabriel Dec 14 '16 at 01:30
2

Now scikit-learn has StackingClassifier which can be used to stack multiple estimators.

from sklearn.datasets import load_iris  
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.ensemble import StackingClassifier
X, y = load_iris(return_X_y=True)
estimators = [
    ('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
    ('lg', LogisticRegression()))
   ]
clf = StackingClassifier(
estimators=estimators, final_estimator=LogisticRegression()
)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y, random_state=42
)
clf.fit(X_train, y_train)
clf.predict_proba(X_test)
Natheer Alabsi
  • 2,790
  • 4
  • 19
  • 28