Trying to encode data in a csv file. TA in class recommend LabelEncoder in sklearn. There's one column names education_level. And I need to encode it in "High, Medium, Low" order. But the LabelEncoder.fit_transform use ASCII code as default, which means it would encoder in "High, Low, Medium" order.
Found no methods to use self define order to encode it. Code attach below.
# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
# load train.csv
df = pd.read_csv('./train.csv')
objfeatures = df.select_dtypes(include="object").columns
le = preprocessing.LabelEncoder()
# Use Label Encoder
# TODO
# Any Better Way to encode the data? How to deal with missing values
for feat in objfeatures:
df[feat] = le.fit_transform(df[feat].astype(str))