0

Suppose I have a dataframe as follows:

                 (MM/DD/YYYY)
Name    Amount    Start_date        End_date         Country
John    11        01/01/2022        01/02/2022       USA
Betty   13        01/01/2022        04/16/2022       China
Bruce   17        01/01/2022        05/17/2022       Germany
John    19        02/01/2022        03/19/2022       USA
Betty   27        04/17/2022        06/30/2022       China

In this dataframe, the date columns are of datatype datetime. I want to create a new dataframe in which I can reduce the number of rows as follows:

  • Sum the amount for each person
  • Have the proper start_date and end_date (assume the new start_date always starts one day after old end_date without gaps)
  • Copy the Country column as is, without summing/changing anything

The desired output should look as follows:

Name    Amount    Start_date        End_date         Country
John    30        01/01/2022        03/19/2022       USA
Betty   40        01/01/2022        06/30/2022       China
Bruce   17        01/01/2022        05/17/2022       Germany

I believe .groupby(), .sum() or maybe both are needed.

I tried some things with .groupby() over multiple columns, but I can only manage to get one column correct. That is, I only managed to sum over amount, but dates and country would not be successfull. I really need to do all 3 bullet points in one go, otherwise I mess up my dataframe.

Xtiaan
  • 252
  • 1
  • 11
  • First convert both columns to datetimes and use `df = df.groupby(['Name','Country'], as_index=False).agg({'Amount':'sum','Start_date':'min', 'End_date':'max'})` – jezrael Dec 12 '22 at 09:02
  • Can you explain the code? So I group by all columns I do not want to change and then sum over the dates? And what about amount? – Xtiaan Dec 12 '22 at 09:05

0 Answers0