0

How can I encode each categorical unique value to numerical value as I wish?

HeatingQC: Heating quality and condition

   Ex   Excellent
   Gd   Good
   TA   Average/Typical
   Fa   Fair
   Po   Poor

I tried to encode this categorical data to numerical. So I used sklearn.processing.LabelEncoder. What I expected was to assign a greater number to Ex and a less number to Po. i.e Ex = 4, Gd = 3, so on.

from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
encoded_data = label_encoder.fit_transform(data)

print(data)
print(encoded_data)

output is

Id
1461    TA
1462    TA
1463    Gd
1464    Ex
1465    Ex
Name: HeatingQC, dtype: object
[2 2 1 0 0]

How can I encode ex to 4 and Po to 0?

David kim
  • 180
  • 1
  • 1
  • 11
  • 1
    `LabelEncoder` encodes in alphabetical order. You need to create and use a custom endcoder. Do you really want to hardcode every unique value to a corresponding number you want? – emremrah Apr 13 '20 at 00:26

1 Answers1

1

In a very basic way you can do:

  • initialize a map as you want to
encoding_map = {
  'Ex': 4,
  'Po': 0
}
  • map every value to a minimum possible corresponding number
for item in data:
  if not item in encoding_map.keys():
    minimum = min(encoding_map.values())
    while minimum in encoding_map.values():
      minimum += 1
    encoding_map[item] = minimum
  • encode the data
encoded_data = [encoding_map.get(item) for item in data]
emremrah
  • 1,733
  • 13
  • 19