I want to build a Pipeline in sklearn and test different models using GridSearchCV.
Just an example (please do not pay attention on what particular models are chosen):
reg = LogisticRegression()
proj1 = PCA(n_components=2)
proj2 = MDS()
proj3 = TSNE()
pipe = [('proj', proj1), ('reg' , reg)]
pipe = Pipeline(pipe)
param_grid = {
'reg__c': [0.01, 0.1, 1],
}
clf = GridSearchCV(pipe, param_grid = param_grid)
Here if I want to try different models for dimensionality reduction, I need to code different pipelines and compare them manually. Is there an easy way to do it?
One solution I came up with is define my own class derived from base estimator:
class Projection(BaseEstimator):
def __init__(self, est_name):
if est_name == "MDS":
self.model = MDS()
...
...
def fit_transform(self, X):
return self.model.fit_transform(X)
I think it will work, I just create a Projection object and pass it to Pipeline, using names of the estimators as parameters for it.
But to me this way is a bit chaotic and not scalable: it makes me to define new class each time I want to compare different models. Also to continue this solution, one could implement a class that does the same job, but with arbitrary set of models. It seems overcomplicated to me.
What is the most natural and pythonic way to compare different models?