Imputation of mixed data types with pandas and Scikit-Learn

Asked Mar 30 '23 at 21:06

Active Mar 30 '23 at 21:06

Viewed 88 times

I have to create a pre-processing pipeline dynamically to impute missing values, this is, I want to go through all the columns in a pandas data frame (which I don't know before-hand), and impute their missing values. To impute the missing values I use sklearn.preprocessing.SimpleImputer

I use a different imputer in case the column is numerical or not, like this:


numerical_imputer = SimpleImputer(strategy='median')

categorical_imputer = SimpleImputer(missing_values=None,strategy='most_frequent')

My problem is that sometimes pandas would encode the missing values as one of np.nan, None. pd.NaN, and it's not always the same. If I force the missing values encoding it changes the whole column dtype which is something I don't want to do

Is there any way to make this work with any data type and missing value encoding (of the possible ones for pandas)?

asked Mar 30 '23 at 21:06

Rodrigo A

Imputation of mixed data types with pandas and Scikit-Learn

0 Answers0