Suppose I have a dataframe as follows:
(MM/DD/YYYY)
Name Amount Start_date End_date Country
John 11 01/01/2022 01/02/2022 USA
Betty 13 01/01/2022 04/16/2022 China
Bruce 17 01/01/2022 05/17/2022 Germany
John 19 02/01/2022 03/19/2022 USA
Betty 27 04/17/2022 06/30/2022 China
In this dataframe, the date columns are of datatype datetime. I want to create a new dataframe in which I can reduce the number of rows as follows:
- Sum the amount for each person
- Have the proper start_date and end_date (assume the new start_date always starts one day after old end_date without gaps)
- Copy the Country column as is, without summing/changing anything
The desired output should look as follows:
Name Amount Start_date End_date Country
John 30 01/01/2022 03/19/2022 USA
Betty 40 01/01/2022 06/30/2022 China
Bruce 17 01/01/2022 05/17/2022 Germany
I believe .groupby(), .sum() or maybe both are needed.
I tried some things with .groupby() over multiple columns, but I can only manage to get one column correct. That is, I only managed to sum over amount, but dates and country would not be successfull. I really need to do all 3 bullet points in one go, otherwise I mess up my dataframe.