I have a dataframe with column grade
which contains categorical values. My problem result in the fact, that the type of the values are float
and not object
.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
"key": ["K0", "K1", "K2", "K3", "K4"],
"grade": [1.0, 2.0, 2.0, np.nan, 3.0],
}
)
df =
key grade
0 K0 1.0
1 K1 2.0
2 K2 2.0
3 K3 NaN
4 K4 3.0
I have missing values in column grade
. I want to impute missing values with most frequent values by using feature-engine
which is based on sklearn. Feature-engine includes widely used missing data imputation methods, such as mean and median imputation, frequent category imputation, random sample imputation.
Install and load library:
! pip install feature-engine
from feature_engine.imputation import CategoricalImputer
Apply imputer:
# set up the imputer
imputer = CategoricalImputer(variables=['grade'], imputation_method='frequent')
# fit the imputer
imputer.fit(df)
# transform the data
df = imputer.transform(df)
df.head()
I get the following TypeError
:
TypeError: Some of the variables are not categorical. Please cast them as object before calling this transformer
I understand the error but I don't understand why it appears. According to the docs, feature-engine
can handle numerical variables with this transformer.
My questions are:
- How can I fix this by using the same transformer? Did I misunderstood the docs?
- If this transformer doesn't work, what other solutions do you suggest?