1

I have to create a pre-processing pipeline dynamically to impute missing values, this is, I want to go through all the columns in a pandas data frame (which I don't know before-hand), and impute their missing values. To impute the missing values I use sklearn.preprocessing.SimpleImputer

I use a different imputer in case the column is numerical or not, like this:


numerical_imputer = SimpleImputer(strategy='median')

categorical_imputer = SimpleImputer(missing_values=None,strategy='most_frequent')

My problem is that sometimes pandas would encode the missing values as one of np.nan, None. pd.NaN, and it's not always the same. If I force the missing values encoding it changes the whole column dtype which is something I don't want to do

Is there any way to make this work with any data type and missing value encoding (of the possible ones for pandas)?

Rodrigo A
  • 657
  • 7
  • 23

0 Answers0