0
             id              Name  ...  start_date   end_date
0          2108  Ahsen Javed Khan  ...  2018-02-17 2018-02-18
1          2108  Ahsen Javed Khan  ...  2018-01-23 2018-01-24
2          2108  Ahsen Javed Khan  ...  2012-11-15 2012-12-31
3          2108  Ahsen Javed Khan  ...  2015-05-01 2015-10-31
4          2108  Ahsen Javed Khan  ...  2017-10-01 2017-12-31

such that i have 500K records, now i want the details of all id and i don't want to remove duplication of id's.

I want to have output like below.

             id              Name  ...  start_date   end_date
0          2108  Ahsen Javed Khan  ...  2018-02-17 2018-02-18
1                                  ...  2018-01-23 2018-01-24
2                                  ...  2012-11-15 2012-12-31
3                                  ...  2015-05-01 2015-10-31
4                                  ...  2017-10-01 2017-12-31
  • `df.groupby(['id','name']).agg(start_date=('start_date','min'), end_date=('end_date','max'))` ? – Umar.H Jun 24 '21 at 10:48
  • 1
    If need something else let me know. Now understand your question need remove duplicated values per column `id` and `Name` – jezrael Jun 24 '21 at 10:50
  • actually i need to map all other columns with respect to ID..... so once i call particular id all relevant columns should be called off...what about groupby.size() function.. – Syed Farhan Ahmed Jun 24 '21 at 12:42

0 Answers0