Imputer on some Dataframe columns in Python

Question

I am learning how to use Imputer on Python.

This is my code:

df=pd.DataFrame([["XXL", 8, "black", "class 1", 22],
["L", np.nan, "gray", "class 2", 20],
["XL", 10, "blue", "class 2", 19],
["M", np.nan, "orange", "class 1", 17],
["M", 11, "green", "class 3", np.nan],
["M", 7, "red", "class 1", 22]])

df.columns=["size", "price", "color", "class", "boh"]

from sklearn.preprocessing import Imputer

imp=Imputer(missing_values="NaN", strategy="mean" )
imp.fit(df["price"])

df["price"]=imp.transform(df["price"])

However this rises the following error: ValueError: Length of values does not match length of index

What's wrong with my code???

Thanks for helping

score 17 · Answer 1 · answered Jul 26 '16 at 10:49

17

This is because Imputer usually uses with DataFrames rather than Series. A possible solution is:

imp=Imputer(missing_values="NaN", strategy="mean" )
imp.fit(df[["price"]])
df["price"]=imp.transform(df[["price"]]).ravel()

# Or even 
imp=Imputer(missing_values="NaN", strategy="mean" )
df["price"]=imp.fit_transform(df[["price"]]).ravel()

answered Jul 26 '16 at 10:49

frist

1,918
12
25

2

Why is `ravel()` necessary here? It seems to return the correct type without it – KevinG Apr 21 '18 at 19:54
1. If you are making 2 dimension df[["price"]] , then ravel() is not needed. In order for Imputer & fit_transform to work, all we need is 2 dimension. df[["price"]] converts data into 2 dimension . Format (Row count, 1) . 2. if you are using 1 dimension- df["price"], then the below will still work but will also return error - ValueError: Expected 2D array, got 1D array instead: array df["price"]=imp.fit_transform(df["price"]).ravel() – Jagannath Banerjee Sep 28 '18 at 09:51

score 3 · Answer 2 · answered May 31 '19 at 12:23

Here is the documentation for Simple Imputer For the fit method, it takes array-like or sparse metrix as an input parameter. you can try this :

imp.fit(df.iloc[:,1:2]) 
df['price']=imp.transform(df.iloc[:,1:2])

provide index location to fit method and then apply the transform.

>>> df
   size  price   color    class   boh
 0  XXL    8.0   black  class 1  22.0
 1    L    9.0    gray  class 2  20.0
 2   XL   10.0    blue  class 2  19.0
 3    M    9.0  orange  class 1  17.0
 4    M   11.0   green  class 3   NaN
 5    M    7.0     red  class 1  22.0

Same way you can do for boh

imp.fit(df.iloc[:,4:5])
df['price']=imp.transform(df.iloc[:,4:5])
>>> df
    size  price   color    class   boh
 0  XXL    8.0   black  class 1  22.0
 1    L    9.0    gray  class 2  20.0
 2   XL   10.0    blue  class 2  19.0
 3    M    9.0  orange  class 1  17.0
 4    M   11.0   green  class 3  20.0
 5    M    7.0     red  class 1  22.0

Kindly correct me if I am wrong. Suggestions will be appreciated.

score 2 · Answer 3 · answered Jul 26 '16 at 08:20

I think you want to specify the axis for the imputer, then transpose the array it returns:

import pandas as pd
import numpy as np

df=pd.DataFrame([["XXL", 8, "black", "class 1", 22],
["L", np.nan, "gray", "class 2", 20],
["XL", 10, "blue", "class 2", 19],
["M", np.nan, "orange", "class 1", 17],
["M", 11, "green", "class 3", np.nan],
["M", 7, "red", "class 1", 22]])

df.columns=["size", "price", "color", "class", "boh"]

from sklearn.preprocessing import Imputer

imp=Imputer(missing_values="NaN", strategy="mean",axis=1 ) #specify axis
q = imp.fit_transform(df["price"]).T #perform a transpose operation


df["price"]=q
print df

Unfortuantely this isnt working for me :( ValueError: Expected 2D array, got 1D array instead: — Indi, Feb 14 '18 at 07:42

score 1 · Answer 4 · answered Aug 19 '18 at 06:23

Simple solution is to provide a 2D array

df=pd.DataFrame([["XXL", 8, "black", "class 1", 22],
["L", np.nan, "gray", "class 2", 20],
["XL", 10, "blue", "class 2", 19],
["M", np.nan, "orange", "class 1", 17],
["M", 11, "green", "class 3", np.nan],
["M", 7, "red", "class 1", 22]])

df.columns=["size", "price", "color", "class", "boh"]

from sklearn.preprocessing import Imputer

imp=Imputer(missing_values="NaN", strategy="mean" )
imp.fit(df[["price"]])

df["price"]=imp.transform(df[["price"]])

df['boh'] = imp.fit_transform(df[['price']])

Here is your DataFrame

Cleaned DataFrame

Imputer on some Dataframe columns in Python

4 Answers4

Linked