I am trying to build a pipeline of transformations. I have an example below where undersample_train_set accepts three parameters: X is a dataframe of features, y is a np.array of labels and strategy_count is a dictionary of counts for each label. SMOTE_train_set accepts similar aparmeters except addition of cat_cols: array of categorical features and knn=1 for k_neighborhood.
I want to put these steps into a Pipeline but before that they are transformed using FunctionTransformer with the functions kargs and then I call them into the pipeline as you see.
However, the pipeline gives the error of: TypeError: undersample_train_set() missing 1 required positional argument: 'y'
I have been reading documents and examples on such as Documentation and stackoverflow and similar and found out every example only uses functions where only one X is called while I have X, y. is that the problem and reason Pipeline throws error? I tested FunctionsTransformer with fit to my X and y and it worked fine with the results expected but it wnt run in Pipeline. Any hint as where I am doing it wrong?
def undersample_train_set(X, y, strategy_count):
under = RandomUnderSampler(sampling_strategy=strategy_count, random_state=42)
X_resample, y_resample = under.fit_resample(X, y)
return X_resample, y_resample
def SMOTE_train_set(X, y, cat_cols, strategy_count, knn):
smote_nc = SMOTENC(categorical_features=cat_cols,
sampling_strategy=strategy_count,
random_state=1,
k_neighbors=knn)
X_resample, y_resample = smote_nc.fit_resample(X, y)
return X_resample, y_resample
transformer_under = FunctionTransformer(undersample_train_set,
kw_args={'strategy_count': under_strategy_count})
transformer_SMOTE = FunctionTransformer(SMOTE_train_set,
kw_args={'cat_cols': cat_cols_bool_arr,
'strategy_count': SMOTE_strategy_count,
'knn': 1})
# Pipleine
pipe_transformations = Pipeline([('under', transformer_under), ('smote', transformer_SMOTE)]).fit_transform(X, y)