1

I have 2 columns, called decision1 and decision2.

I want to compare them in order to get the highest occurrence between the 2, so I either get the highest occurrence in decision1 or decision2, according to the greatest one. So far my attempts led to this, but with no success since I just get the highest occurrence in EACH column and not combined

 # weightage option
if args['weightage'] == "yes":
    attr1 = data['decision'].value_counts().idxmax  #highest occurrence in decision
    attr2 = data['decision2'].value_counts().idxmax #highest occurrence in decision2
    heaviest_attribute = data.groupby(['decision','decision2']).size()

Ideally I would just need to use some kind of max() function between attr1 and attr2 but I don't know how to handle this.

For example, given this table

enter image description here

I want to compare decision1 and decision2 columns as if they were one column, and the expected output, in this case, would be 'Yes', because it's the most recurrent value.

3 Answers3

1

Here's a simple solution.

Its best to convert the content into a list and finding max occurance in list is simple.

import pandas as pd
data = pd.DataFrame({'decision': ['yes', 'maybe', 'yes', 'maybe', 'yes'], 
               'decision 2': ['No', 'No', 'Perhaps', 'Perhaps', 'unsure']
            })
a = list(data['decision'])+list(data['decision 2'])
a = max(set(a), key=a.count)
print(a)

Output:

yes
Srivatsav Raghu
  • 399
  • 4
  • 11
1

Use DataFrame.melt with Series.mode and select first value by position with Series.iat:

a = df[['decision','decision 2']].melt()['value'].mode().iat[0]

Or reshape by DataFrame.stack:

a = df[['decision','decision 2']].stack().mode().iat[0]

print (a)
Yes

Detail:

print (df[['decision','decision 2']].melt()['value'])
0        Yes
1      Maybe
2        Yes
3      Maybe
4        Yes
5         No
6         No
7    Perhaps
8    Perhaps
9     unsure
Name: value, dtype: object
print (df[['decision','decision 2']].stack())
0  decision          Yes
   decision 2         No
1  decision        Maybe
   decision 2         No
2  decision          Yes
   decision 2    Perhaps
3  decision        Maybe
   decision 2    Perhaps
4  decision          Yes
   decision 2     unsure
dtype: object

EDIT:

s = df.eq(a).any()

col = s.index[s][0]
print (col)
decision
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks, this works like a charm (especially `a = df[['decision','decision 2']].stack().mode().iat[0]`). One bonus question though: by getting this value, assuming each column has its own values so in our case 'Yes' will always be in decision, and never in decision2, can we retrieve the column name by using the value? – Andre Retro Pie Dec 10 '20 at 09:18
  • @AndreRetroPie - I think I understand, answer was edited. – jezrael Dec 10 '20 at 09:22
  • Thanks sir, you're my hero – Andre Retro Pie Dec 10 '20 at 09:33
0

Probably there exist a much more elegant solution...

df = pd.DataFrame({'decision': ['Yes', 'Maybe', 'Yes', 'Maybe', 'Yes'], 
                   'decision 2': ['No', 'No', 'Perhaps', 'Perhaps', 'unsure']})

d1 = dict(df.groupby('decision').count().loc[:, 'decision 2'])
d2 = dict(df.groupby('decision 2').count().loc[:, 'decision'])

d1.update(d2)

max(d1, key=d1.get)
yxteh
  • 126
  • 1
  • 8