I have a pipeline which uses lambda functions:
preprocess_ppl = ColumnTransformer(
transformers=[
('encode', categorical_transformer, make_column_selector(dtype_include=object)),
('zero_impute', fill_na_zero_transformer, lambda X: [col for col in fill_zero_cols if col in X.columns] ),
('numeric', numeric_transformer, lambda X: [col for col in num_cols if col in X.columns])
]
)
pipeline2 = Pipeline(
steps=[
('dropper', drop_cols),
('remover',feature_remover),
("preprocessor", preprocess_ppl),
("estimator", customOLS(sm.OLS))
]
)
Basically, the lambda functions selects/subsets the columns only if the columns are present in X. Sometimes some columns are removed by intermediate step and it is possible that the a column in num_cols was removed hence I use lambda function to select only the present columns.
The problem is, the lambda function is not serializable and I have to use pickle I cannot use dill. Is there any other way of doing these lamda functions?