-2

I have a data frame as below -

df_add = pd.DataFrame({
    'doc_id':[100,101,102,103],
    'last_name':['Mallesham','Mallesham','Samba','Bhavik'],
    'first_name':['Yamulla','Yamulla','Anil','Yamulla'],
    'dob':['06-03-1900','06-03-1900','20-09-2020','09-16-2020']
})

enter image description here

Here doc_id 100 and 101 are duplicated rows on considering last, first names and DOB's.

Here My requirement is to roll up 101 to 100 as follows -

enter image description here

doc_id should be filled up as 100;101 with semicolon separator.

In a second case:

If I have just consider last_name and first_name combination it should display as below since a Same Name persons might have different DOB's

enter image description here

myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30

1 Answers1

1

You need to change doc_id to str , to use str.cat function

df_add["doc_id"] = df_add["doc_id"].astype('str)
resultant_df = df_add.groupby(["first_name",
           "last_name","dob"])[['doc_id']].apply(lambda x : x.str.cat(sep=','))

print(resultant_df.reset_index())

     first_name  last_name  dob         0
0    Anil        Samba      20-09-2020  102
1    Yamulla     Bhavik     09-16-2020  103
2    Yamulla     Mallesham  06-03-1900  100,101
qaiser
  • 2,770
  • 2
  • 17
  • 29
  • What if i just consider first_name, last name combinations it should display dob with ; and doc_id with; as showed in the last picture in question. – myamulla_ciencia Mar 21 '22 at 04:51