I'm trying to implement a pipeline consisting of several steps and for a few of the stages I need data in pandas
format. Is it possible to implement a wrapper solution in sklearn
where I can get "pandas in, pandas out" as a result of sklearn transformation and not numpy
array?
I thought of writing a class inheriting the class I wish to include in pipeline and so something like this:
class RFE_Custom(RFE):
def __init__(self, *params):
super(params)
def fit(self, X, Y):
print("Inside fit function....")
return super.fit()
def transform(self, X):
print("Inside transform function...")
base = super.transform()
# convert to pandas dataframe
return pd.DataFrame(base, index=X.index)
def predict(self, X):
print("Inside Predict function...")
base = super.predict()
# convert to pandas dataframe
return pd.DataFrame(base, index=X.index)
However I'm unable to implement this correctly as I'm getting error while calling fit for grid search object. Is there any feasible way around this. I hope this might be a very common issue and there must be some standard way to approach this type of cases where one need to interact with before and after results of pipeline steps.
NOTE: RFE is an example. There are quite few sklearn
estimators and classes for which I wish to implement similar concepts.