0

I have the column in my data frame

city 

London
Paris
New York 
.
.

I am label encoding the column and it assigns the 0 to London , 1 to Paris and 2 to New York . But when I pass single value for predictions from model I gives city name New York and it assigns the 0 to it . How it shall remains same , I want that if New York values assigns 2 by label encoder in training phase, it should assign 2 again at the predictions .

Code
from sklearn.preprocessing import LabelEncoder
labelencoder=LabelEncoder()
df['city']=labelencoder.fit_transform(df['city'])
Hamza
  • 530
  • 5
  • 27

1 Answers1

1

You need to use fit or fit_transform to fit the encoder, then transform on the data that you want to encode to get labels (if you do fit_transform on that data, it will re-fit the encoder, and if you only pass one value, it will be encoded as 0):

df['label'] = labelencoder.fit_transform(df['city'])
# df
#        city  label
# 0    London      0
# 1     Paris      2
# 2  New York      1
labelencoder.transform(['New York'])
# array([1])
perl
  • 9,826
  • 1
  • 10
  • 22
  • What if New York exists in another column too ! , Will it reassign the value of this column or that column – Hamza May 18 '21 at 12:23
  • @Haseeb You specify which column to use for fitting the data, so it will not be affected by other columns (in our case it will encode values from `city` column since we're doing `df['city']`) – perl May 18 '21 at 17:47