How to create a new column in a dataframe using a multikey dictionary

Question

I have a multikey dictionary that I would like to use to create a new column in a dataframe. Doing so with a single key dataframe is quite easy but I am stumped as to what the correct syntax is to send two values to the dictionary.

I have been able to use a single key dictionary using map, get, and apply (map example)

    import pandas as pd      
    df = pd.DataFrame(data = {'Col1': [1, 2, 3, 4], 'Col2': ['A', 'B', 'C', 'D']})

    single_dict = {1: 'This', 2: 'is', 3: 'pretty', 4: 'easy'}

    df['newcol_a'] = df['Col1'].map(single_dict)

    print(df)```

which returns the expected"

    Col1 Col2 newcol_a
    0     1    A     This
    1     2    B       is
    2     3    C   pretty
    3     4    D     easy

But when I create a multikey dictionary such as

dbl_dict = {1: {'A': 'THIS', 'B': 'blah', 'C': 'blah', 'D': 'blah'},
            2: {'A': 'blah', 'B': 'HAS' , 'C': 'blah', 'D': 'blah'},
            3: {'A': 'blah', 'B': 'blah', 'C': 'ME'  , 'D': 'blah'},
            4: {'A': 'blah', 'B': 'blah', 'C': 'blah', 'D': 'STUMPED'},}

I am able to call it using 'get'

dbl_dict.get(1, {}).get('A', 'Other')
Out[5]: 'THIS'

But I cannot figure out the syntax (tried about 40 different things, such as df['newcol_b'] = df[['Col1', 'Col2']].map(dbl_dict) ) to get the desired results:

    Col1 Col2 newcol_a
    0     1    A     THIS
    1     2    B      HAS
    2     3    C       ME
    3     4    D  STUMPED

DeepSpace · Answer 1 · 2020-01-29T12:40:49.967

map does not know how to handle a nested dict. If you insist on using this dict you can use apply on the entire dataframe but you'd have to create a custom mapping function:

import pandas as pd

df = pd.DataFrame(data={'Col1': [1, 2, 3, 4], 'Col2': ['A', 'B', 'C', 'D']})
dbl_dict = {1: {'A': 'THIS', 'B': 'blah', 'C': 'blah', 'D': 'blah'},
            2: {'A': 'blah', 'B': 'HAS', 'C': 'blah', 'D': 'blah'},
            3: {'A': 'blah', 'B': 'blah', 'C': 'ME', 'D': 'blah'},
            4: {'A': 'blah', 'B': 'blah', 'C': 'blah', 'D': 'STUMPED'}}

df['new_col'] = df.apply(lambda s: dbl_dict.get(s['Col1'], {}).get(s['Col2']), axis=1)

df is now

   Col1 Col2  new_col
0     1    A     THIS
1     2    B      HAS
2     3    C       ME
3     4    D  STUMPED

A solution with loc (or at) might be possible (and if so, will probably be faster). Need to look into that.

Thank you, this certainly gets the code to run. But your comment " If you insist on using this dict" makes me ask, would you recommend a different approach? The data that I am working with requires conditionally grouping the data based on the values of 2 or 3 columns for both analysis and reporting. Do you have other suggestions? I should add that my background is in SAS and quite often the best tool for this in SAS was the Format function -which is similar to dictionaries and how I ended up with this approach. again, thanks. — 29Clyde, Jan 29 '20 at 12:40

score 0 · Answer 2 · answered Jan 29 '20 at 12:29

The easiest option you have, in my opinion, is to create a new DataFrame using your nested dictionary and unstack this DataFrame, which you can then join with your original DataFrame, like so:

s = pd.DataFrame(dbl_dict).unstack().rename_axis(('Col1','Col2')).rename('new_column')
print (s)
df = df.join(s, on=['Col1','Col2'])
print (df)

How to create a new column in a dataframe using a multikey dictionary

2 Answers2