How to construct a wrapper over sklearn models

Question

I'm trying to implement a pipeline consisting of several steps and for a few of the stages I need data in pandas format. Is it possible to implement a wrapper solution in sklearn where I can get "pandas in, pandas out" as a result of sklearn transformation and not numpy array?

I thought of writing a class inheriting the class I wish to include in pipeline and so something like this:

class RFE_Custom(RFE):
    
    def __init__(self, *params):
        super(params)
        
    def fit(self, X, Y):
        
        print("Inside fit function....")
        return super.fit()
        
    def transform(self, X):
        
        print("Inside transform function...")
        base =  super.transform()
        
        # convert to pandas dataframe
        return pd.DataFrame(base, index=X.index)
    
    def predict(self, X):

        print("Inside Predict function...")
        base =  super.predict()
        
        # convert to pandas dataframe
        return pd.DataFrame(base, index=X.index)

However I'm unable to implement this correctly as I'm getting error while calling fit for grid search object. Is there any feasible way around this. I hope this might be a very common issue and there must be some standard way to approach this type of cases where one need to interact with before and after results of pipeline steps.

NOTE: RFE is an example. There are quite few sklearn estimators and classes for which I wish to implement similar concepts.

Just out of curiosity, why would you want to convert your results into pandas? — INGl0R1AM0R1, Sep 26 '22 at 13:29

How to construct a wrapper over sklearn models

0 Answers0