19

Below is my pipeline and it seems that I can't pass the parameters to my models by using the ModelTransformer class, which I take it from the link (http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html)

The error message makes sense to me, but I don't know how to fix this. Any idea how to fix this? Thanks.

# define a pipeline
pipeline = Pipeline([
('vect', DictVectorizer(sparse=False)),
('scale', preprocessing.MinMaxScaler()),
('ess', FeatureUnion(n_jobs=-1, 
                     transformer_list=[
     ('rfc', ModelTransformer(RandomForestClassifier(n_jobs=-1, random_state=1,  n_estimators=100))),
     ('svc', ModelTransformer(SVC(random_state=1))),],
                     transformer_weights=None)),
('es', EnsembleClassifier1()),
])

# define the parameters for the pipeline
parameters = {
'ess__rfc__n_estimators': (100, 200),
}

# ModelTransformer class. It takes it from the link
(http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html)
class ModelTransformer(TransformerMixin):
    def __init__(self, model):
        self.model = model
    def fit(self, *args, **kwargs):
        self.model.fit(*args, **kwargs)
        return self
    def transform(self, X, **transform_params):
        return DataFrame(self.model.predict(X))

grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, refit=True)

Error Message: ValueError: Invalid parameter n_estimators for estimator ModelTransformer.

nkhuyu
  • 840
  • 3
  • 9
  • 23
  • Thanks for asking--I had the same question. Let me ask you another thing. Do you know why does *self.model.fit(*args, **kwargs)* work? I mean you don't usually pass hyperparameters like n_estimators when calling the fit method, but when defining the class instance, eg, rfc=RandomForestClassifier(n_estimators=100), rfc.fit(X,y) – drake Apr 24 '16 at 02:59
  • @drake, when you create a ModelTransformer instance, you need to pass in a model with its parameters. For example, ModelTransformer(RandomForestClassifier(n_jobs=-1, random_state=1, n_estimators=100))). And here self.model.fit(*args, **kwargs) mostly means self.model.fit(X, y). – nkhuyu Apr 26 '16 at 03:32
  • Thanks, @nkhuyu. I know that's how it works. I was asking why. Since self.model = model, self.model=RandomForestClassifier(n_jobs=-1, random_state=1, n_estimators=100). I understand *args is unpacking (X, y), but I don't understand WHY one needs **kwargs in the fit method when self.model already knows the hyperparameters. – drake Apr 26 '16 at 16:17

1 Answers1

22

GridSearchCV has a special naming convention for nested objects. In your case ess__rfc__n_estimators stands for ess.rfc.n_estimators, and, according to the definition of the pipeline, it points to the property n_estimators of

ModelTransformer(RandomForestClassifier(n_jobs=-1, random_state=1,  n_estimators=100)))

Obviously, ModelTransformer instances don't have such property.

The fix is easy: in order to access underlying object of ModelTransformer one needs to use model field. So, grid parameters become

parameters = {
  'ess__rfc__model__n_estimators': (100, 200),
}

P.S. it's not the only problem with your code. In order to use multiple jobs in GridSearchCV, you need to make all objects you're using copy-able. This is achieved by implementing methods get_params and set_params, you can borrow them from BaseEstimator mixin.

Artem Sobolev
  • 5,891
  • 1
  • 22
  • 40
  • can you expand on this PS a bit? I think I have the same issue where when I try to use gridsearchcv with pipeline feature union I get the error AttributeError: 'SelectColumns' object has no attribute 'get_params' where SelectColumns is a class I wrote for the pipeline. – B_Miner Jun 05 '15 at 01:43
  • 10
    @B_Miner, you should inherit your `SelectColumns` class from the [`BaseEstimator`](http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html) which provides aforementioned `set_params` and `get_params`. Alternatively, you can implement your own ones, but most of the time you don't want to. – Artem Sobolev Jun 05 '15 at 01:59
  • 2
    I was looking for BaseEstimatorMixin. I inherited from BaseEstimator and it worked like a charm, thanks! – B_Miner Jun 05 '15 at 02:19
  • @ArtemSobolev I am working on the same kind of thing. I am getting an error "cannot deepcopy this pattern object", when I try to use cross_val_predict or gridsearch CV with same pipeline. Could you please show how you did it with feature union? – Hareendra Chamara Philips Oct 20 '17 at 11:08