Maximum occurrences in a list

Question

I have a data frame like the one below.

import pandas as pd

data = {'Date': ['2022/09/01', '2022/09/02', '2022/09/03', '2022/09/04', '2022/09/05','2022/09/01', '2022/09/02', '2022/09/03', '2022/09/04', '2022/09/05','2022/09/01', '2022/09/02', '2022/09/03', '2022/09/04', '2022/09/05'],
        'Runner': ['Runner A', 'Runner A', 'Runner A', 'Runner A', 'Runner A','Runner B', 'Runner B', 'Runner B', 'Runner B', 'Runner B','Runner C', 'Runner C', 'Runner C', 'Runner C', 'Runner C'],
        'Training Time': ['less than 1 hour', 'less than 1 hour', 'less than 1 hour', 'less than 1 hour', '1 hour to 2 hour','less than 1 hour', '1 hour to 2 hour', 'less than 1 hour', '1 hour to 2 hour', '2 hour to 3 hour', '1 hour to 2 hour ', '2 hour to 3 hour' ,'1 hour to 2 hour ', '2 hour to 3 hour', '2 hour to 3 hour']
        }

df = pd.DataFrame(data)

I have counted the occurrence for each runner using the below code

s = df.groupby(['Runner','Training Time']).size()
s

The result is like below. Each Runner

And when I use below code to get the max occurence.

df = s.loc[s.groupby(level=0).idxmax()].reset_index().drop(0,axis=1)
df

Result Max Occurrence

The problem is on Runner B. It should show "1 hour to 2 hour" and "less than 1 hour". But now it only shows ""1 hour to 2 hour"

How can I fix the issue? Thanks. Expected Result

But I want to find out the max occurrence of daily training hours each runner in this time period — PJL, Oct 18 '22 at 04:07
My result for Runner A and Runner C is perfect. The problem is Runner B -->when there are same occurrences for two items, how can I be alerted ?? — PJL, Oct 18 '22 at 04:10

Ashish Jain · Answer 1 · 2022-10-18T04:38:01.583

import pandas as pd

data = {'Date': ['2022/09/01', '2022/09/02', '2022/09/03', '2022/09/04', '2022/09/05','2022/09/01', '2022/09/02', '2022/09/03', '2022/09/04', '2022/09/05','2022/09/01', '2022/09/02', '2022/09/03', '2022/09/04', '2022/09/05'],
        'Runner': ['Runner A', 'Runner A', 'Runner A', 'Runner A', 'Runner A','Runner B', 'Runner B', 'Runner B', 'Runner B', 'Runner B','Runner C', 'Runner C', 'Runner C', 'Runner C', 'Runner C'],
        'Training Time': ['less than 1 hour', 'less than 1 hour', 'less than 1 hour', 'less than 1 hour', '1 hour to 2 hour','less than 1 hour', '1 hour to 2 hour', 'less than 1 hour', '1 hour to 2 hour', '2 hour to 3 hour', '1 hour to 2 hour ', '2 hour to 3 hour' ,'1 hour to 2 hour ', '2 hour to 3 hour', '2 hour to 3 hour']
        }

df = pd.DataFrame(data)
s = df.groupby(['Runner', 'Training Time'], as_index=False).size()
s.columns = ['Runner', 'Training Time', 'Size']

r = s.groupby(['Runner'], as_index=False)['Size'].max()

df_list = []
for index, row in r.iterrows():
    temp_df = s[(s['Runner'] == row['Runner']) & (s['Size'] == row['Size'])]
    df_list.append(temp_df)

df_report = pd.concat(df_list)
print(df_report)
    
df_report.to_csv('report.csv', index = False)

Yes looks good in the output but when I export to excel, it only las last two rows, ie header and Runner C only. Any idea I can export all the runners? — PJL, Oct 18 '22 at 04:35
Saved the df_report in a CSV file, please check the edited answer. — Ashish Jain, Oct 18 '22 at 04:38
Yes run again. This code is perfect. sorry for misunderstanding just now — PJL, Oct 18 '22 at 04:39

Joran Beasley · Accepted Answer · 2022-10-18T04:26:12.770

0

thsi is the best i could come up with ... its kind of gross

def agg_most_common(vals):
    print("vals")
    matches = []
    for i in collections.Counter(vals).most_common():
        if not matches or matches[0][1] == i[1]:
            matches.append(i)
        else:
            break
    return [x[0] for x in matches]

print(df.groupby('Runner')['Training Time'].agg(agg_most_common))

edited Oct 18 '22 at 04:26

answered Oct 18 '22 at 04:12

Joran Beasley

110,522
12
160
179

Correct me if I am wrong. Not showing Max occurrence result. – PJL Oct 18 '22 at 04:20
@PJL I misunderstood what you were asking ... i think i corrected it – Joran Beasley Oct 18 '22 at 04:27
Thanks Joran. I get the message --name 'collections' is not defined. Believe I have lost somewhere. – PJL Oct 18 '22 at 04:40
`import collections` should solve that – Joran Beasley Oct 18 '22 at 06:08
Let me try tomorrow , will comment thanks – PJL Oct 18 '22 at 14:03
For the output it is perfect but if export to excel, seems not working, right? – PJL Oct 19 '22 at 04:39

Maximum occurrences in a list

2 Answers2