How to apply label encoding uniformly in all columns?

Question

I have a dataset of which I have attached an image.

The set of unique values in Origin and Dest are same. Upon doing label encoding of those columns, I thought that value ATL will get same encoding in 'Origin' and 'Dest' but it turns out that the given code:

label_encoder = LabelEncoder()
flight_f['UniqueCarrier'] = label_encoder.fit_transform(flight_f['UniqueCarrier'])
flight_f['Origin'] = label_encoder.fit_transform(flight_f['Origin'])
flight_f['Dest'] = label_encoder.fit_transform(flight_f['Dest'])

Gives different encoding to a particular value in the two columns. And this is just the training set. I think in test set, I might get different values too which will hamper the predicitive analysis.

Can anyone suggest a solution, please?

score 0 · Answer 1 · answered Jun 10 '22 at 05:33

Instead of applying a label encoder for each column like that, you probably want to try this

df.apply(LabelEncoder().fit)

And if you do fit_transform method, you probably will get a different encoding result that's why instead using fit_transform, you probably better use fit

here's the example

le = LabelEncoder()
# fit your training and test set
l_train = [1,2,3,4,5]
le.fit(l_train)
l_test [ 6, 7, 8]
le.fit(l_test)

le.transform(l_train)
# array([0, 1, 2, 3, 4], dtype=int64)
le.transform([2,3,4,5,6,7])
#array([1, 2, 3, 4, 5, 6], dtype=int64)

score 0 · Answer 2 · answered Jun 18 '22 at 20:46

I think what you need is "stack()":

from sklearn.preprocessing import LabelEncoder
import pandas as pd
label_encoder = LabelEncoder()
df = pd.DataFrame(data=[[8, "ATL", "DFW"], 
                        [9, "PIT", "ATL"],
                        [1, "DFW", "ATL"],
                        [5, "RDU", "CLE"]], columns=["Month", "Origin", "Dest"])

df

Month	Origin	Dest
8	ATL	DFW
9	PIT	ATL
1	DFW	ATL
5	RDU	CLE

label_encoder.fit(df[['Origin','Dest']].stack().unique())

df['Origin_encode'] = label_encoder.transform(df['Origin'])
df['Dest_encode'] = label_encoder.transform(df['Dest'])
df

Month	Origin	Dest	Origin_encode	Dest_encode
8	ATL	DFW	0	2
9	PIT	ATL	3	0
1	DFW	ATL	2	0
5	RDU	CLE	4	1

How to apply label encoding uniformly in all columns?

2 Answers2