4

I have a dataframe with 332 columns. I want to impute values to be able to use scikit-learn's decision tree classifier. My problem is that the column of the resulting data from imputer function is only 330.

from sklearn.preprocessing import Imputer
imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
cols = data.columns
new = imp.fit_transform(data)

print(data.shape,new.shape)
(34132, 332) (34132, 330)
jrlund
  • 41
  • 4

1 Answers1

6

According to the documentation of sklearn.preprocessing.Imputer:

When axis=0, columns which only contained missing values at fit are discarded upon transform.

So, this is removing all-missing-value columns.

Ami Tavory
  • 74,578
  • 11
  • 141
  • 185