How to pd.fillna(mean()) acccording to a column value which changes?

Question

I have the following dataframe:

data/hora                                                                      
2017-08-18 09:22:33   22162          NaN        65.9           NaN          NaN
2017-10-03 11:08:26   22162          NaN        60.5           NaN          NaN
2018-02-17 01:45:24   22162          NaN        69.7           NaN          NaN
2018-02-17 01:45:55   74034          NaN        67.5           NaN          NaN
2018-02-17 01:46:29   74034          NaN        65.4           NaN          NaN
2018-02-17 01:47:20   74034          NaN        63.3           NaN          NaN
2018-02-17 01:48:35   74034          NaN        61.3           NaN          NaN
2018-02-17 01:49:08   17448          NaN        63.4           NaN          NaN
2018-02-17 01:49:31   17448          NaN        65.5           NaN          NaN
2018-02-17 01:49:55   17448          NaN        67.6           NaN          NaN

To which I want to fill the NaN as the mean of which column. However, this value change as the 'Machine' changes - there are three machine values. Therefore, I need I fillna that changes according to Machine column value.

I tried:

for i in df:
    if i.isin(df.loc[df['Machine'] == '22162']):
        df.fillna(df.loc[df['Machine'] == '22162'].mean)
    elif i.isin(df.loc[df['Machine'] == '17448']):
        df.fillna(df.loc[df['Machine'] == '17448'].mean)
    elif i.isin(df.loc[df['Machine'] == '74034']):
        df.fillna(df.loc[df['Machine'] == '74034'].mean)

But it didn't work.

Thanks!

Please provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). — Yi Bao, May 29 '19 at 15:05
```for i in df:``` is going to iterate through the column names. — iamchoosinganame, May 29 '19 at 15:07
@iamchoosinganame how do I cal to iterate through the column cells? — Marlon Henrique Teixeira, May 29 '19 at 15:17
You can iterate over rows using DataFrame.iterrows and iterate through the columns using DataFrame.iteritems. However, I think the best approach is performing a groupby on the "Machine" column and then applying fillna as suggested by @WeNYoBen. — iamchoosinganame, May 29 '19 at 15:19
@iamchoosinganame I've tried that `df.fillna(df.groupby(df.Machine).mean())` but didn't work. — Marlon Henrique Teixeira, May 29 '19 at 18:00

score 1 · Accepted Answer · answered May 29 '19 at 20:25

It's a bit all over the place & hard coded but it should work. I named the NaN columns ['A', 'C', 'D']

         data      hora  machine   A     B   C   D
0  2017-08-18  09:22:33    22162 NaN  65.9 NaN NaN
1  2017-10-03  11:08:26    22162 NaN  60.5 NaN NaN
2  2018-02-17  01:45:24    22162 NaN  69.7 NaN NaN
3  2018-02-17  01:45:55    74034 NaN  67.5 NaN NaN
4  2018-02-17  01:46:29    74034 NaN  65.4 NaN NaN
5  2018-02-17  01:47:20    74034 NaN  63.3 NaN NaN
6  2018-02-17  01:48:35    74034 NaN  61.3 NaN NaN
7  2018-02-17  01:49:08    17448 NaN  63.4 NaN NaN
8  2018-02-17  01:49:31    17448 NaN  65.5 NaN NaN
9  2018-02-17  01:49:55    17448 NaN  67.6 NaN NaN

columns = ['A', 'C', 'D']
for clm in columns:
    df[clm] = df[clm].fillna(df.machine.map(df.groupby('machine')['B'].mean().to_dict()))

Results in

         data      hora  machine          A     B          C          D
0  2017-08-18  09:22:33    22162  65.366667  65.9  65.366667  65.366667
1  2017-10-03  11:08:26    22162  65.366667  60.5  65.366667  65.366667
2  2018-02-17  01:45:24    22162  65.366667  69.7  65.366667  65.366667
3  2018-02-17  01:45:55    74034  64.375000  67.5  64.375000  64.375000
4  2018-02-17  01:46:29    74034  64.375000  65.4  64.375000  64.375000
5  2018-02-17  01:47:20    74034  64.375000  63.3  64.375000  64.375000
6  2018-02-17  01:48:35    74034  64.375000  61.3  64.375000  64.375000
7  2018-02-17  01:49:08    17448  65.500000  63.4  65.500000  65.500000
8  2018-02-17  01:49:31    17448  65.500000  65.5  65.500000  65.500000
9  2018-02-17  01:49:55    17448  65.500000  67.6  65.500000  65.500000

Probably not the best way but gets the job done.

Man, It is not properly what I was looking for. I already got what I wanted with help of others. However, as it is not naive and cool, I'll let as the answer! :D — Marlon Henrique Teixeira, May 29 '19 at 20:45
Can you post what you did? I imagine there's a better way & I'd love to see a proper solution @MarlonHenriqueTeixeira — Vink, May 30 '19 at 13:50
I was in fact simple tryinh to fill all the NaNs with respective column mean - and not with B.mean. :) But you've did something cool! Thanks! — Marlon Henrique Teixeira, May 31 '19 at 14:00

score 0 · Answer 2 · answered May 31 '19 at 14:02

0

This is how I've solved my problem:

grupo = df.groupby(df["Machine"])
cada_maquina = list(grupo)

for i in range(3):
    cada_maquina[i][1].fillna(cada_maquina[i][1].mean(), inplace=True)

Thank you very much for every comment! :D

answered May 31 '19 at 14:02

Marlon Henrique Teixeira

93
2
8

How to pd.fillna(mean()) acccording to a column value which changes?

2 Answers2