2

I am hoping for some help in summarizing the dataframe detailed below into a one row summary as shown in desired output further down on the page. Many thanks in advance.

employees = {'Name of Employee': ['Mark','Mark','Mark','Mark','Mark','Mark', 'Mark','Mark','Mark','Mark','Mark','Mark','Mark'],
                         'Department': ['21','21','21','21','21','21', '21','21','21','21','21','21','21'],
                         'Team': ['2','2','2','2','2','2','2','2','2','2','2','2','2'],
                         'Log': ['2020-02-19 09:01:17', '2020-02-19 09:54:02', '2020-04-10 11:00:31', '2020-04-11 12:39:08', '2020-04-18 09:45:22', '2020-05-05 09:01:17', '2020-05-23 09:54:02', '2020-07-03 11:00:31', '2020-07-03 12:39:08', '2020-07-04 09:45:22', '2020-07-05 09:01:17', '2020-07-06 09:54:02', '2020-07-06 11:00:31'],
                         'Call Duration' : ['0.01178', '0.01736','0.01923','0.00911','0.01007','0.01206','0.01256','0.01006','0.01162','0.00733','0.01250','0.01013','0.01308'],
                         'ITT': ['NO','YES', 'NO', 'Follow up', 'YES','YES', 'NO', 'Follow up','YES','YES', 'NO','YES','YES']
                        }
            
df = pd.DataFrame(employees)
    

Desired output:

Name  Dept  Team     Start      End      Weeks Total Calls  Ave. Call time  Sold  Rejected  more info
Mark   21    2    2020-02-19 2020-07-06  19.71      13          0.01207       7       4        2

The logic I am seeking to apply is (although I'm guessing there are errors in the syntax I have written below, I hope that you are still able to understand the calculations):

  • Start = min date in df['Log']
  • End = max date in df['Log']
  • Weeks = (max date in df['log'] - min date in df['Log'])/7
  • Total Calls = df['Log'].count
  • Ave. Call time = (df['Call Duration'].sum)/(df['Log'].count)
  • Sold = (df['ITT']=='YES').count
  • Rejected = (df['ITT']=='NO').count
  • more info = (df['ITT']=='Follow up').count
windwalker
  • 359
  • 4
  • 14

2 Answers2

3

Try this using pd.NamedAgg with groupby:

df['Log'] = pd.to_datetime(df['Log'])
df['Call Duration'] = df['Call Duration'].astype(float)

df.groupby(['Name of Employee', 'Team', 'Department'])\
  .agg(Start = ('Log','min'),
       End = ('Log', 'max'),
        Weeks = ('Log', lambda x: np.ptp(x) / np.timedelta64(7, 'D')),
        Total_Calls = ('Log', 'count'),
        Avg_Call_Time = ('Call Duration', 'mean'),
        Sold = ('ITT', lambda x: (x == 'YES').sum()),
        Rejected = ('ITT', lambda x: (x == 'NO').sum()),
        More_info = ('ITT', lambda x: (x=='Follow up').sum()))

Output:

                                               Start                 End      Weeks  Total_Calls  Avg_Call_Time  Sold  Rejected  More_info
Name of Employee Team Department                                                                                                          
Mark             2    21         2020-02-19 09:01:17 2020-07-06 11:00:31  19.726114           13       0.012068     7         4          2
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • 1
    Thanks that is absolutely brilliant. I made one small change to your code so that it included 'Team' df.groupby(['Name of Employee', 'Department', 'Team'] – windwalker Oct 07 '20 at 20:17
  • @windwalker Thanks. Yep, I missed that. :) I added it now. Happy coding. Be safe and stay healthy. – Scott Boston Oct 07 '20 at 20:18
  • I have noticed that the groupby headers are not treated as columns and therefore disappear when saved into a database. How can the groupby data be converted to columns and displayed like the rest? – windwalker Oct 07 '20 at 22:08
  • 1
    My previous question was somewhat premature. After some trial and error, I found that df.reset_index() done the trick of converting the groupby data to df columns. – windwalker Oct 07 '20 at 22:36
0

U had syntax error, where you forgot to put commas at the end of each key. And now u can work on this dataframe.

import pandas as pd
    employees = {'Name=': ['Mark','Mark','Mark','Mark','Mark','Mark', 'Mark','Mark','Mark','Mark','Mark','Mark','Mark'],
                             'Department': ['21','21','21','21','21','21', '21','21','21','21','21','21','21'],
                             'Team': ['2','2','2','2','2','2','2','2','2','2','2','2','2'],
                             'Log': ['2020-02-19 09:01:17', '2020-02-19 09:54:02', '2020-04-10 11:00:31', '2020-04-11 12:39:08', '2020-04-18 09:45:22', '2020-05-05 09:01:17', '2020-05-23 09:54:02', '2020-07-03 11:00:31', '2020-07-03 12:39:08', '2020-07-04 09:45:22', '2020-07-05 09:01:17', '2020-07-06 09:54:02', '2020-07-06 11:00:31'],
                             'Call Duration' : ['0.01178', '0.01736','0.01923','0.00911','0.01007','0.01206','0.01256','0.01006','0.01162','0.00733','0.01250','0.01013','0.01308'],
                             'ITT': ['NO','YES', 'NO', 'Follow up', 'YES','YES', 'NO', 'Follow up','YES','YES', 'NO','YES','YES']
                            }
                
    df = pd.DataFrame(employees)
    print(df)

Output:-

              Name  Department  ... Call Duration        ITT
              Mark         21  ...       0.01178         NO
              Mark         21  ...       0.01736        YES
              Mark         21  ...       0.01923         NO
              Mark         21  ...       0.00911  Follow up
              Mark         21  ...       0.01007        YES
              Mark         21  ...       0.01206        YES
              Mark         21  ...       0.01256         NO
              Mark         21  ...       0.01006  Follow up
              Mark         21  ...       0.01162        YES
              Mark         21  ...       0.00733        YES
              Mark         21  ...       0.01250         NO
              Mark         21  ...       0.01013        YES
              Mark         21  ...       0.01308        YES

[13 rows x 6 columns]
Vishal Kamlapure
  • 590
  • 4
  • 16
  • thanks for spotting that oversight on my part. Are you able to advise how I can go about producing the one row summary? – windwalker Oct 07 '20 at 19:57