Fill missing values using a nested dictionary

Question

Here is my sample dataframe:

df = pd.DataFrame(data=[[3, np.nan, np.nan],[5, np.nan, np.nan]], index=['country1', 'country2'], columns=[2021, 2022, 2023])

Here is my sample dictionary:

d = {'country1': {'key1': 'a', 'key2': 'assumed','key3': {2022: '10', 2023: ' 20'}}, 'country2': {'key1': 'b', 'key2': 'assumed', 'key3': {2022: '30', 2023: ' 40'}}}

I am aiming to use the dictionary d to replace the missing values in the dataframe df. I thought I'd use something like:

df.fillna(d2)

where d2 is a dictionary based on dictionary d:

d2 = {'country1': {2022: '10', 2023: ' 20'}, 'country2': {2022: '30', 2023: ' 40'}}

I don't know how to generate d2 but it doesn't work anyway.

The result would look like this:

pd.DataFrame(data=[[3, 10, 20],[5, 30, 40]], index=['country1', 'country2'], columns=[2021, 2022, 2023])

It's unclear what you're trying to achieve. Please provide your desired final output. — OD1995, Mar 24 '22 at 17:18

Shubham Sharma · Accepted Answer · 2022-03-24T18:33:42.780

3

We can still use fillna but before that we have to normalize/transform the dictionary in a format which is suitable for fillna

df.T.fillna({k: v['key3'] for k, v in d.items()}).T

Result

         2021 2022 2023
country1  3.0   10   20
country2  5.0   30   40

edited Mar 24 '22 at 18:33

answered Mar 24 '22 at 17:30

Shubham Sharma

68,127
6
24
53

Thanks! My df in real life has a multi-index with two levels. Level 0 is the continent and level 1 is the country. How do I change your solution for it to work with my real-life df? At the moment it's failing. I need fillna to disregard level 0 of the index, but I'd like to keep level 0 in the solution. – ric Mar 25 '22 at 14:27
I have solved it with an intermediate step where I get rid of the index level 0 and apply your solution to create a temporary fill_values dataframe, which I then pass to df.fillna(value=fill_values). If you have a better solution, please let me know. – ric Mar 25 '22 at 15:28
@ric Glad to help. I guess you have already figured out the solution that is the way to go. But anyways I'll let you know if there is a better way.. – Shubham Sharma Mar 25 '22 at 15:31

score 1 · Answer 2 · answered Mar 24 '22 at 17:27

1

Rather than using df.fillna(d2), it looks like the best way of achieving this would be the following:

for country,country_dict in d.items():
    for year,value in country_dict['key3'].items():
        df.loc[country,year] = value

answered Mar 24 '22 at 17:27

OD1995

1,647
4
22
52

Fill missing values using a nested dictionary

2 Answers2