Is it possible to change pandas column data type within a sklearn pipeline?

Question

Sklearn pipeline I am using has multiple transformers but one of the initial transformers returns numerical type and the consecutive one takes object type variables.

Basically I need squeeze in a:

data[col] = data[col].astype(object)

for the required columns within the pipeline.

Is there any way to do it?

Note: I am using Feature-engine transformers.

From version 1.1.0 if I remember correctly Feature-engine's categorical encoders now take the parameter ignore_format=False, which allows to apply the encoding to variables that are not of type object. So this, may simplify your pipeline, because now you don't need to re-cast the variables any more. — Sole Galli, Sep 21 '21 at 07:27
Also, if the transformer that is returning numerical variables is the CategoricalImputer, you can set the parameter object=True, so that it returns object directly. — Sole Galli, Sep 21 '21 at 07:27

score 12 · Accepted Answer · edited May 25 '23 at 13:00

12

Yes, you can use a sklearn.preprocessing.FunctionTransformer. A simple example would be,

def to_object(x):
  return pd.DataFrame(x).astype(object)

fun_tr = FunctionTransformer(to_object)

y = fun_tr.fit_transform(pd.DataFrame({'a':[1,2,3]}))

edited May 25 '23 at 13:00

Lander Van laer

21
3
6

answered Dec 25 '19 at 11:20

thushv89

10,865
1
26
39

Is it possible to change pandas column data type within a sklearn pipeline?

1 Answers1