0

I have a multikey dictionary that I would like to use to create a new column in a dataframe. Doing so with a single key dataframe is quite easy but I am stumped as to what the correct syntax is to send two values to the dictionary.

I have been able to use a single key dictionary using map, get, and apply (map example)

    import pandas as pd      
    df = pd.DataFrame(data = {'Col1': [1, 2, 3, 4], 'Col2': ['A', 'B', 'C', 'D']})

    single_dict = {1: 'This', 2: 'is', 3: 'pretty', 4: 'easy'}

    df['newcol_a'] = df['Col1'].map(single_dict)

    print(df)```

which returns the expected"

    Col1 Col2 newcol_a
    0     1    A     This
    1     2    B       is
    2     3    C   pretty
    3     4    D     easy

But when I create a multikey dictionary such as

dbl_dict = {1: {'A': 'THIS', 'B': 'blah', 'C': 'blah', 'D': 'blah'},
            2: {'A': 'blah', 'B': 'HAS' , 'C': 'blah', 'D': 'blah'},
            3: {'A': 'blah', 'B': 'blah', 'C': 'ME'  , 'D': 'blah'},
            4: {'A': 'blah', 'B': 'blah', 'C': 'blah', 'D': 'STUMPED'},}

I am able to call it using 'get'

dbl_dict.get(1, {}).get('A', 'Other')
Out[5]: 'THIS'      

But I cannot figure out the syntax (tried about 40 different things, such as df['newcol_b'] = df[['Col1', 'Col2']].map(dbl_dict) ) to get the desired results:

    Col1 Col2 newcol_a
    0     1    A     THIS
    1     2    B      HAS
    2     3    C       ME
    3     4    D  STUMPED
29Clyde
  • 33
  • 3

2 Answers2

2

map does not know how to handle a nested dict. If you insist on using this dict you can use apply on the entire dataframe but you'd have to create a custom mapping function:

import pandas as pd

df = pd.DataFrame(data={'Col1': [1, 2, 3, 4], 'Col2': ['A', 'B', 'C', 'D']})
dbl_dict = {1: {'A': 'THIS', 'B': 'blah', 'C': 'blah', 'D': 'blah'},
            2: {'A': 'blah', 'B': 'HAS', 'C': 'blah', 'D': 'blah'},
            3: {'A': 'blah', 'B': 'blah', 'C': 'ME', 'D': 'blah'},
            4: {'A': 'blah', 'B': 'blah', 'C': 'blah', 'D': 'STUMPED'}}

df['new_col'] = df.apply(lambda s: dbl_dict.get(s['Col1'], {}).get(s['Col2']), axis=1)

df is now

   Col1 Col2  new_col
0     1    A     THIS
1     2    B      HAS
2     3    C       ME
3     4    D  STUMPED

A solution with loc (or at) might be possible (and if so, will probably be faster). Need to look into that.

DeepSpace
  • 78,697
  • 11
  • 109
  • 154
  • Thank you, this certainly gets the code to run. But your comment " If you insist on using this dict" makes me ask, would you recommend a different approach? The data that I am working with requires conditionally grouping the data based on the values of 2 or 3 columns for both analysis and reporting. Do you have other suggestions? I should add that my background is in SAS and quite often the best tool for this in SAS was the Format function -which is similar to dictionaries and how I ended up with this approach. again, thanks. – 29Clyde Jan 29 '20 at 12:40
0

The easiest option you have, in my opinion, is to create a new DataFrame using your nested dictionary and unstack this DataFrame, which you can then join with your original DataFrame, like so:

s = pd.DataFrame(dbl_dict).unstack().rename_axis(('Col1','Col2')).rename('new_column')
print (s)
df = df.join(s, on=['Col1','Col2'])
print (df)
Robo Bayer
  • 267
  • 2
  • 11