5

I understand that if I run a OneHotEncoder by itself, I am able to change the feature names that it generates from x1_1, x1_2, etc. by calling .get_feature_names e.g.:

encoder.get_feature_names(['Sex', 'AgeGroup'])

will change x1_1, x2_2 to AgeGroup_1, AgeGroup_2 etc.

However, if I run the OneHotEncoder as one of a few transformations in a ColumnTransformer, how would I be able to change set prefix?

  1. Is there a way to set this prefix before the encoding even starts, e.g. within the initialization parameters to OneHotEncoder, or
  2. somehow in-line with the ColumnTransformer, or
  3. without doing some string parsing replacement parsing on the columns after the fit_transform?
D Malan
  • 10,272
  • 3
  • 25
  • 50
james
  • 51
  • 3

1 Answers1

0

From the sklearn docs what I have found is that it is possible to stop the ColumnTransformer from adding the encoder name as the prefix by setting the parameter verbose_feature_names_out to False . Then once you call the get_feature_names_out() function, it will automatically prefix the new feature names with the current feature names. Here is an example:

from sklearn.compose import ColumnTransformer  

df2 = pd.DataFrame({'A': list('1245'), 'B': list('3456')}, dtype ="category")

# The initial dataset
   A  B
0  1  3
1  2  4
2  4  5
3  5  6

transformer = ColumnTransformer([('encoder', OneHotEncoder(), ['A'])],
                               remainder='passthrough',
                               verbose_feature_names_out=False)
transformed = transformer.fit_transform(df2)
transformed_df = pd.DataFrame(transformed,
                              columns=transformer.get_feature_names_out())
transformed_df.head()

# New output
    A_1     A_2     A_4     A_5     B
0   1.0     0.0     0.0     0.0     3
1   0.0     1.0     0.0     0.0     4
2   0.0     0.0     1.0     0.0     5
3   0.0     0.0     0.0     1.0     6

Tested in sklearn version 1.0.2

t T s
  • 75
  • 1
  • 10