0

I'm trying to implement a pipeline consisting of several steps and for a few of the stages I need data in pandas format. Is it possible to implement a wrapper solution in sklearn where I can get "pandas in, pandas out" as a result of sklearn transformation and not numpy array?

I thought of writing a class inheriting the class I wish to include in pipeline and so something like this:

class RFE_Custom(RFE):
    
    def __init__(self, *params):
        super(params)
        
    def fit(self, X, Y):
        
        print("Inside fit function....")
        return super.fit()
        
    def transform(self, X):
        
        print("Inside transform function...")
        base =  super.transform()
        
        # convert to pandas dataframe
        return pd.DataFrame(base, index=X.index)
    
    def predict(self, X):

        print("Inside Predict function...")
        base =  super.predict()
        
        # convert to pandas dataframe
        return pd.DataFrame(base, index=X.index)

However I'm unable to implement this correctly as I'm getting error while calling fit for grid search object. Is there any feasible way around this. I hope this might be a very common issue and there must be some standard way to approach this type of cases where one need to interact with before and after results of pipeline steps.

NOTE: RFE is an example. There are quite few sklearn estimators and classes for which I wish to implement similar concepts.

Alexander L. Hayes
  • 3,892
  • 4
  • 13
  • 34
Anand
  • 361
  • 1
  • 9
  • 23

0 Answers0