Filling missing data in pandas dataframe on the basis of a value in another column

Question

I'm working on filling missing values in a pandas dataframe that I have. The simplest case would be to use df.fillna() on the entire column with missing values. This works, but I would like to extend this by adding a condition.

Namely, I calculate the median based on the category and if a missing value has Label = 'A' then it is filled with the median of all values that have Label = 'A'

Likewise, if a value has Label = 'B', then it is filled with the median of all values that have Label = 'B'

Below is workable code.

df = pd.DataFrame([[20,'A'],[24,'B'],[23,'A'],[22,'B'],[np.nan,'A'],[24,'B'],[np.nan,'B']],columns=['Value','Label'])

A_median = np.nanmedian(df[df['Label']=='A']['Value'])
B_median = np.nanmedian(df[df['Label']=='B']['Value'])

Currently the dataframe is:

    Value   Label
0   20.0    A
1   24.0    B
2   23.0    A
3   22.0    B
4   NaN     A
5   24.0    B
6   NaN     B

The desired output is:

    Value   Label
0   20.0    A
1   24.0    B
2   23.0    A
3   22.0    B
4   21.5    A
5   24.0    B
6   24.0    B

I tried the following but it causes all other values to be NaN:

df['Value'] = df[df['Label'] == 'A']['Value'].fillna(A_median)

Thank you in advance for your help! Please let me know if you need further information :)

Edit: I know similar questions have been asked but I mainly am trying to understand how to set the value without causing the other category to be null.

use `df['Value'].fillna(df.groupby('Label')['Value'].transform('mean'),inplace=True)` — ansev, Sep 08 '19 at 21:41
Use `df['Value'] = df['Value'].fillna(df.groupby('Label')['Value'].transform('median'))` — Erfan, Sep 08 '19 at 21:42
Thank you guys. I searched for a while and never found that question but it solves the issue perfectly. — Nicklovn, Sep 08 '19 at 21:43

Filling missing data in pandas dataframe on the basis of a value in another column

0 Answers0