Mapping dict keys to column of pandas dataframe if they're close

Question

I'm working with probabilities that correspond to certain categories and I would like to map them to the categories of interest in a new column of a pandas DataFrame.

I would normally use pandas.Series.map for such a task but the probabilities have been truncated when processed in another language and so this doesn't work.

I would like to know if it's possible to combine pd.Series.map and np.isclose together so that the following example will work as needed? Any alternative approaches would be appreciated also!

import pandas as pd

df = pd.DataFrame({
    'a': [1, 2, 3],
    'prob': np.round([0.6**(1/30.), 0.9**(1/10.), 0.8**(1/20.)], decimals = 4)
    })

prob_dict = {
    0.9**(1/10.): 'catA', 
    0.6**(1/30.): 'catB', 
    0.8**(1/20.): 'catC'}

df['cat'] = df.prob.map(prob_dict)

>> df
>>    a      prob  cat
>> 0  1  0.983117  NaN
>> 1  2  0.989519  NaN
>> 2  3  0.988905  NaN

My required/needed output is ...

>> df
>>    a      prob  cat
>> 0  1  0.983117  catB
>> 1  2  0.989519  catA
>> 2  3  0.988905  catC

Would there be multiple categories having their thresholds (closeness) overlapping or would they be spread out a considerable delta apart? — Nickil Maveli, Jan 17 '17 at 15:06
Spread out a reasonable amount. Certainly not within 2 dp of each other. — p-robot, Jan 17 '17 at 16:18

score 2 · Answer 1 · answered Jan 17 '17 at 14:43

2

You have your keys and values mixed up.

prob_dict = {v: k for k, v in prob_dict.items()}

df['cat'] = df.prob.map(prob_dict)
print(df)

   a      prob   cat
0  1  0.983117  catB
1  2  0.989519  catA
2  3  0.988905  catC

answered Jan 17 '17 at 14:43

piRSquared

285,575
57
475
624

My apologies. You are right and I've edited the question to round them to 4 dp now which brings the code in line with the question. Thanks and +1 for pointing this out. – p-robot Jan 17 '17 at 16:15

score 2 · Accepted Answer · edited Jan 17 '17 at 16:21

2

You can use np.isclose along with a specified absolute threshold of a value to be compared against (Here: atol=0.0001 is chosen) after reshaping the values in the Prob column to take on 2-D axis.

These get compared to the .values() method of the dictionary and returns True if a close match is found.

cond = np.isclose(df.prob.values[:, None], list(prob_dict.keys()), atol=10**-4)
indi = np.argwhere(cond)[:, 1]     # Get all column indices fulfilling the above condition
df['cat'] = np.array(list(prob_dict.values()))[indi]  # Let keys take on newly imputed slice

edited Jan 17 '17 at 16:21

p-robot

4,652
2
29
38

answered Jan 17 '17 at 14:46

Nickil Maveli

29,155
8
82
85

1

This works. Thanks. As pointed out by piRSquared, my question had keys and values the wrong way around so I've edited your response accordingly which solves the problem. Thanks to both of you. – p-robot Jan 17 '17 at 16:19

Mapping dict keys to column of pandas dataframe if they're close

2 Answers2