How to Assign Feature Names in a OneHotEncoder through Column Transformer

Question

I understand that if I run a OneHotEncoder by itself, I am able to change the feature names that it generates from x1_1, x1_2, etc. by calling .get_feature_names e.g.:

encoder.get_feature_names(['Sex', 'AgeGroup'])

will change x1_1, x2_2 to AgeGroup_1, AgeGroup_2 etc.

However, if I run the OneHotEncoder as one of a few transformations in a ColumnTransformer, how would I be able to change set prefix?

Is there a way to set this prefix before the encoding even starts, e.g. within the initialization parameters to OneHotEncoder, or
somehow in-line with the ColumnTransformer, or
without doing some string parsing replacement parsing on the columns after the fit_transform?

t T s · Answer 1 · 2023-02-02T19:53:50.583

From the sklearn docs what I have found is that it is possible to stop the ColumnTransformer from adding the encoder name as the prefix by setting the parameter verbose_feature_names_out to False . Then once you call the get_feature_names_out() function, it will automatically prefix the new feature names with the current feature names. Here is an example:

from sklearn.compose import ColumnTransformer  

df2 = pd.DataFrame({'A': list('1245'), 'B': list('3456')}, dtype ="category")

# The initial dataset
   A  B
0  1  3
1  2  4
2  4  5
3  5  6

transformer = ColumnTransformer([('encoder', OneHotEncoder(), ['A'])],
                               remainder='passthrough',
                               verbose_feature_names_out=False)
transformed = transformer.fit_transform(df2)
transformed_df = pd.DataFrame(transformed,
                              columns=transformer.get_feature_names_out())
transformed_df.head()

# New output
    A_1     A_2     A_4     A_5     B
0   1.0     0.0     0.0     0.0     3
1   0.0     1.0     0.0     0.0     4
2   0.0     0.0     1.0     0.0     5
3   0.0     0.0     0.0     1.0     6

Tested in sklearn version 1.0.2

How to Assign Feature Names in a OneHotEncoder through Column Transformer

1 Answers1