1

I have scikitlearn pipeline and I intend to encode a categorical feature. But problem is I have another step before this encoding which deletes the feature based on some logic and in the encoding step I want to encode only if there is the feature existing even after removel.

here is the code I have:

preprocess_ppl = ColumnTransformer(
        transformers=[
            ('categorical', categorical_transformer, ['MARITAL_STATUS']),
            ('zero_impute', fill_na_zero_transformer, lambda X: [col for col in fill_zero_cols if col in X.columns] ),
            ('numeric', numeric_transformer, lambda X: [col for col in num_cols if col in X.columns])
        ]
    )
    pipeline2 = Pipeline(
        steps=[
            ('dropper', drop_cols),
            ('remover',feature_remover),
            ("preprocessor", preprocess_ppl),
            ("estimator", customOLS(sm.OLS))
            ]
    )
      categorical_transformer = Pipeline(steps=[
        ('categorical_imputer', SimpleImputer(strategy="constant", fill_value='Unknown')),
        ('encoder', OneHotEncoder(handle_unknown='ignore'))
    ])

    preprocess_ppl = ColumnTransformer(
        transformers=[
            ('categorical', categorical_transformer, ['MARITAL_STATUS']),
            ('zero_impute', fill_na_zero_transformer, lambda X: [col for col in fill_zero_cols if col in X.columns] ),
            ('numeric', numeric_transformer, lambda X: [col for col in num_cols if col in X.columns])
        ]
    )

Sometime the dropper or remover step removes the Marital Status feature and thus the pipeline gives error that the column in not present in the data.

Is there any way to do this?

Obiii
  • 698
  • 1
  • 6
  • 26
  • What is your `drop_cols` (and `feature_remover`)? If they return frames, then the same `lambda` you use to select columns for the other transformers should work? – Ben Reiniger Aug 09 '22 at 14:33
  • Hi, drop_cols and feature_remover returns dataframe. If I use lambda, then it would mean input [] empty list which will give error. – Obiii Aug 09 '22 at 15:13
  • 1
    What version of sklearn? I thought column transformers handled an empty list of columns gracefully (esp. for the context of `make_column_selector`)? – Ben Reiniger Aug 09 '22 at 17:48
  • I ma using 1.1.1, will try with make_column_selector. – Obiii Aug 09 '22 at 21:02
  • Hi, I am using the lamda functions in the pipeline, is it possible to ditch the lamda function and use make_column_selector? Lambda is not supported in pickle and I cannot use dill. – Obiii Aug 12 '22 at 09:38
  • Define a proper function. (I was sure I had linked another question for this earlier, but can't find it now.) – Ben Reiniger Aug 17 '22 at 18:36

0 Answers0