1

Calculating and imputing the mean using dask-ml works fine when changing all the columns that are np.nan:

imputer = impute.SimpleImputer(strategy='mean')
data = [[100, 2], [np.nan, np.nan], [70, 7]]
df = pd.DataFrame(data, columns = ['Weight', 'Age']) 
x3 = imputer.fit_transform(df)
print(x3)

    Weight  Age
 0  100.0   2.0
 1  85.0    4.5
 2  70.0    7.0

But what if I need to leave Age untouched? Is it possible to specify what columns to impute?

ps0604
  • 1,227
  • 23
  • 133
  • 330

1 Answers1

2

You should be able to specify colums by df.Weight = imputer.fit_transform(df.Weight) or by indexing columns df.loc["Weight"]

  • Can you please provide the full code? I'm getting `Expected 2D array, got 1D array instead` – ps0604 Dec 22 '20 at 15:00
  • It has to have 2D array, I converted code into iloc but you still can use loc by reshaping. `df.iloc[:, 0:1] = imputer.fit_transform(df.iloc[:,0:1])` – Kaan Berke UĞURLAR Dec 22 '20 at 15:09
  • This worked. As an additional comment, it is possible to select any number of columns in `iloc` to calculate the mean, not just one or a range of columns. – ps0604 Dec 22 '20 at 15:17
  • If I understood right, sure it is possible. To select 0th, 1st, and 2nd columns, you can do sth like the following: `df.iloc[:, [0, 1, 2]]` – Kaan Berke UĞURLAR Dec 22 '20 at 17:36