3

I'm working with probabilities that correspond to certain categories and I would like to map them to the categories of interest in a new column of a pandas DataFrame.

I would normally use pandas.Series.map for such a task but the probabilities have been truncated when processed in another language and so this doesn't work.

I would like to know if it's possible to combine pd.Series.map and np.isclose together so that the following example will work as needed? Any alternative approaches would be appreciated also!

import pandas as pd

df = pd.DataFrame({
    'a': [1, 2, 3],
    'prob': np.round([0.6**(1/30.), 0.9**(1/10.), 0.8**(1/20.)], decimals = 4)
    })

prob_dict = {
    0.9**(1/10.): 'catA', 
    0.6**(1/30.): 'catB', 
    0.8**(1/20.): 'catC'}

df['cat'] = df.prob.map(prob_dict)

>> df
>>    a      prob  cat
>> 0  1  0.983117  NaN
>> 1  2  0.989519  NaN
>> 2  3  0.988905  NaN

My required/needed output is ...

>> df
>>    a      prob  cat
>> 0  1  0.983117  catB
>> 1  2  0.989519  catA
>> 2  3  0.988905  catC
p-robot
  • 4,652
  • 2
  • 29
  • 38

2 Answers2

2

You have your keys and values mixed up.

prob_dict = {v: k for k, v in prob_dict.items()}

df['cat'] = df.prob.map(prob_dict)
print(df)

   a      prob   cat
0  1  0.983117  catB
1  2  0.989519  catA
2  3  0.988905  catC
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • My apologies. You are right and I've edited the question to round them to 4 dp now which brings the code in line with the question. Thanks and +1 for pointing this out. – p-robot Jan 17 '17 at 16:15
2

You can use np.isclose along with a specified absolute threshold of a value to be compared against (Here: atol=0.0001 is chosen) after reshaping the values in the Prob column to take on 2-D axis.

These get compared to the .values() method of the dictionary and returns True if a close match is found.

cond = np.isclose(df.prob.values[:, None], list(prob_dict.keys()), atol=10**-4)
indi = np.argwhere(cond)[:, 1]     # Get all column indices fulfilling the above condition
df['cat'] = np.array(list(prob_dict.values()))[indi]  # Let keys take on newly imputed slice

enter image description here

p-robot
  • 4,652
  • 2
  • 29
  • 38
Nickil Maveli
  • 29,155
  • 8
  • 82
  • 85
  • 1
    This works. Thanks. As pointed out by piRSquared, my question had keys and values the wrong way around so I've edited your response accordingly which solves the problem. Thanks to both of you. – p-robot Jan 17 '17 at 16:19