3

Sklearn pipeline I am using has multiple transformers but one of the initial transformers returns numerical type and the consecutive one takes object type variables.

Basically I need squeeze in a:

data[col] = data[col].astype(object)

for the required columns within the pipeline.

Is there any way to do it?

Note: I am using Feature-engine transformers.

k92
  • 375
  • 3
  • 15
  • 1
    From version 1.1.0 if I remember correctly Feature-engine's categorical encoders now take the parameter ignore_format=False, which allows to apply the encoding to variables that are not of type object. So this, may simplify your pipeline, because now you don't need to re-cast the variables any more. – Sole Galli Sep 21 '21 at 07:27
  • Also, if the transformer that is returning numerical variables is the CategoricalImputer, you can set the parameter object=True, so that it returns object directly. – Sole Galli Sep 21 '21 at 07:27

1 Answers1

12

Yes, you can use a sklearn.preprocessing.FunctionTransformer. A simple example would be,

def to_object(x):
  return pd.DataFrame(x).astype(object)

fun_tr = FunctionTransformer(to_object)

y = fun_tr.fit_transform(pd.DataFrame({'a':[1,2,3]}))
thushv89
  • 10,865
  • 1
  • 26
  • 39