I have a data frame as-
my_dt = dt.Frame({'last_name':['mallesh','bhavik','jagarini','mallesh','jagarini'],
'first_name':['yamulla','vemulla','yegurla','yamulla','yegurla'],
'ssn':['1234','7847','0648','4567','0648']})
Here I would like to find out duplicates considering last_name and firs_name columns and if any duplicates found their respective ssn needs to be rolled up with semicolon(;) if SSN are not different. if SSN are also same only one SSN needs to be present.
the expected output as:
Here since mallesh yamulla is duplicated and has different SSN's they are rolled up with ';'
and in case of jagarini yegurla it has a unique SSN hence one SSN is only taken.