Filling null values with mean

Question

I am given a data set with many NaN values and I wanted to fill the null value with the mean of each column. So I tried the following code:

def fill_mean():  
    m = [df.columns.get_loc(c) for c in df.columns if c in missing]
    for i in m:
        df[df.columns[i]] =df[df.columns[i]].fillna(value=df[df.columns[i]].mean())
    return df

but I get this error:

TypeError: must be str, not int

The columns I'm trying to fill are all composed by the same type: which is either 'float64' or 'O'.
I suspect the problem derives from this fact, but how can I solve it?

Edit: I created a dictionary containing the column which contains the index of the columns where some data are missing and each column's type.

di = dict(zip(missing, m2)) 
def fill_mean():
    m = [df.columns.get_loc(c) for c in df.columns if c in missing]
    for i in m:
        if di[m] == "dtype('float64')":
            df[df.columns[i]] = df[df.columns[i]].fillna(value=df[df.columns[i]].mean())
    return df

If I run fill_mean(), now I get a different error:

    if di[m] == "dtype('float64')":

TypeError: unhashable type: 'list'

What if I wanted to do it through iteration? – plastico Apr 27 '18 at 13:04 — plastico, Apr 27 '18 at 13:04
Can you provide some sample data? – Scott Boston Apr 27 '18 at 14:11 — Scott Boston, Apr 27 '18 at 14:11

sacuL · Accepted Answer · 2018-04-27T12:30:56.600

I think you want to first cast your columns as type float, then use df.fillna, using df.mean() as the value argument:

df[["columns", "to", "change"]] = df[["columns", "to", "change"]].astype('float')

df.fillna(df.mean())

Note: If all your columns in your dataframe can be cast to float, then you can simply do:

df = df.astype('float').fillna(df.astype('float').mean())

Example:

df = pd.DataFrame({'col1':np.random.choice([np.nan, '1','2'], 10), 
     'col2':np.random.choice([np.nan, '1', '2'], 10)})


>>> print(df)
  col1 col2
0    2    1
1    2    1
2  nan  nan
3    1    2
4    1    2
5  nan    2
6    2    2
7    2    2
8    1    2
9  nan    1

df[['col1', 'col2']] = df[['col1', 'col2']].astype('float')

df = df.fillna(df.mean())


>>> print(df)
       col1      col2
0  2.000000  1.000000
1  2.000000  1.000000
2  1.571429  1.666667
3  1.000000  2.000000
4  1.000000  2.000000
5  1.571429  2.000000
6  2.000000  2.000000
7  2.000000  2.000000
8  1.000000  2.000000
9  1.571429  1.000000

Filling null values with mean

1 Answers1