LabelEncoding large amounts of categorical data

Question

I have a dataset with 39 categorical and 27 numerical features. I am trying to encode the categorical data and need to be able to inverse transform and call transform for each column again. Is there a prettier way of doing it than defining 39 separate LabelEncoder instances, and then fit_transform to each column individually?

I feel like I am missing something obvious, but I cant figure it out!

enc = LabelEncoder
cat_feat = [col for col in input_df2.columns if input_df2[col].dtype == 'object']
cat_feat = np.asarray(cat_feat)

le1 =LabelEncoder()
le2 =LabelEncoder()
le3 =LabelEncoder()
...
#extended to le39

def label(input):
       input.iloc[:, 1] = le1.fit_transform(input.iloc[:, 1])
       input.iloc[:, 3] = le1.fit_transform(input.iloc[:, 3])
       input.iloc[:, 4] = le1.fit_transform(input.iloc[:, 4])
       ... 
       return input

share the code that you have, to make it easier to make comments for others, one guess might be doing something in a for loop. — Sadra, Mar 10 '22 at 21:18
Sorry - added now! Didn't want to fill up the page with 39 unwieldy encoders! — Tom_Scott, Mar 10 '22 at 21:24

score 1 · Accepted Answer · 2022-03-10T21:46:36.277

1

DataFrame.apply is just for this. It will call the specified function for each column of the dataframe (or each row, if you pass it axis=1):

encoders = []

def apply_label_encoder(col):
    le = LabelEncoder()
    encoders.append(le)
    le.fit_transform(col)
    return 

input_df.iloc[:, 1:] = input_df.iloc[:, 1:].apply(apply_label_encoder)

edited Mar 10 '22 at 21:46

answered Mar 10 '22 at 21:27

Thank you - I have not used apply too much. Would I still need to define individual encoders for each column? – Tom_Scott Mar 10 '22 at 21:42
Oh, sorry! I didn't understand fully. Wait a moment :) – Mar 10 '22 at 21:43
Check the answer now. If you want to access the encoder for a particular column, e.g. the 1st just use `encoder[0]` (zero-based) – Mar 10 '22 at 21:47

LabelEncoding large amounts of categorical data

1 Answers1