11

I am learning how to use Imputer on Python.

This is my code:

df=pd.DataFrame([["XXL", 8, "black", "class 1", 22],
["L", np.nan, "gray", "class 2", 20],
["XL", 10, "blue", "class 2", 19],
["M", np.nan, "orange", "class 1", 17],
["M", 11, "green", "class 3", np.nan],
["M", 7, "red", "class 1", 22]])

df.columns=["size", "price", "color", "class", "boh"]

from sklearn.preprocessing import Imputer

imp=Imputer(missing_values="NaN", strategy="mean" )
imp.fit(df["price"])

df["price"]=imp.transform(df["price"])

However this rises the following error: ValueError: Length of values does not match length of index

What's wrong with my code???

Thanks for helping

kevins_1
  • 1,268
  • 2
  • 9
  • 27
Mauro Gentile
  • 1,463
  • 6
  • 26
  • 37

4 Answers4

17

This is because Imputer usually uses with DataFrames rather than Series. A possible solution is:

imp=Imputer(missing_values="NaN", strategy="mean" )
imp.fit(df[["price"]])
df["price"]=imp.transform(df[["price"]]).ravel()

# Or even 
imp=Imputer(missing_values="NaN", strategy="mean" )
df["price"]=imp.fit_transform(df[["price"]]).ravel()
frist
  • 1,918
  • 12
  • 25
  • 2
    Why is `ravel()` necessary here? It seems to return the correct type without it – KevinG Apr 21 '18 at 19:54
  • 1. If you are making 2 dimension df[["price"]] , then ravel() is not needed. In order for Imputer & fit_transform to work, all we need is 2 dimension. df[["price"]] converts data into 2 dimension . Format (Row count, 1) . 2. if you are using 1 dimension- df["price"], then the below will still work but will also return error - ValueError: Expected 2D array, got 1D array instead: array df["price"]=imp.fit_transform(df["price"]).ravel() – Jagannath Banerjee Sep 28 '18 at 09:51
3

Here is the documentation for Simple Imputer For the fit method, it takes array-like or sparse metrix as an input parameter. you can try this :

imp.fit(df.iloc[:,1:2]) 
df['price']=imp.transform(df.iloc[:,1:2])

provide index location to fit method and then apply the transform.

>>> df
   size  price   color    class   boh
 0  XXL    8.0   black  class 1  22.0
 1    L    9.0    gray  class 2  20.0
 2   XL   10.0    blue  class 2  19.0
 3    M    9.0  orange  class 1  17.0
 4    M   11.0   green  class 3   NaN
 5    M    7.0     red  class 1  22.0

Same way you can do for boh

imp.fit(df.iloc[:,4:5])
df['price']=imp.transform(df.iloc[:,4:5])
>>> df
    size  price   color    class   boh
 0  XXL    8.0   black  class 1  22.0
 1    L    9.0    gray  class 2  20.0
 2   XL   10.0    blue  class 2  19.0
 3    M    9.0  orange  class 1  17.0
 4    M   11.0   green  class 3  20.0
 5    M    7.0     red  class 1  22.0

Kindly correct me if I am wrong. Suggestions will be appreciated.

shinchaan
  • 136
  • 2
  • 12
2

I think you want to specify the axis for the imputer, then transpose the array it returns:

import pandas as pd
import numpy as np

df=pd.DataFrame([["XXL", 8, "black", "class 1", 22],
["L", np.nan, "gray", "class 2", 20],
["XL", 10, "blue", "class 2", 19],
["M", np.nan, "orange", "class 1", 17],
["M", 11, "green", "class 3", np.nan],
["M", 7, "red", "class 1", 22]])

df.columns=["size", "price", "color", "class", "boh"]

from sklearn.preprocessing import Imputer

imp=Imputer(missing_values="NaN", strategy="mean",axis=1 ) #specify axis
q = imp.fit_transform(df["price"]).T #perform a transpose operation


df["price"]=q
print df 
Ryan
  • 3,555
  • 1
  • 22
  • 36
1

Simple solution is to provide a 2D array

df=pd.DataFrame([["XXL", 8, "black", "class 1", 22],
["L", np.nan, "gray", "class 2", 20],
["XL", 10, "blue", "class 2", 19],
["M", np.nan, "orange", "class 1", 17],
["M", 11, "green", "class 3", np.nan],
["M", 7, "red", "class 1", 22]])

df.columns=["size", "price", "color", "class", "boh"]

from sklearn.preprocessing import Imputer

imp=Imputer(missing_values="NaN", strategy="mean" )
imp.fit(df[["price"]])

df["price"]=imp.transform(df[["price"]])

df['boh'] = imp.fit_transform(df[['price']])

Here is your DataFrame

Cleaned DataFrame

Sachin Prabhu
  • 152
  • 2
  • 11