0

at the moment I am working on some data have a problem with some duplicates. Here my problem in detail:

I have the DF:

Col1     Col2     Col3
'aa1'    'bb1'    'cc1'
'aa2'    'bb2'    'cc2'
'aa1'    'bb3'    'cc3'

I can simply use DF.drop_columns(subset = ['Col1']) and receive

Col1     Col2     Col3
'aa1'    'bb1'    'cc1'
'aa2'    'bb2'    'cc2'

but I am loking for

Col1     Col2            Col3
'aa1'    ['bb1','bb3']  ['cc1''cc3']
'aa2'    ['bb2']        ['cc2']

where the the data of Col2 and Col3 is stored as lists in the remaining column.

Thanks, F

Pet
  • 251
  • 1
  • 3
  • 14

1 Answers1

2

If possible all values are lists use GroupBy.agg with list:

df1 = df.groupby('Col1').agg(list).reset_index()
print (df1)
    Col1            Col2            Col3
0  'aa1'  ['bb1', 'bb3']  ['cc1', 'cc3']
1  'aa2'         ['bb2']         ['cc2']

If need lists only for duplicates use lambda function with if-else statement:

f = lambda x: list(x) if len(x) > 1 else x.iat[0]
df2 = df.groupby('Col1').agg(f).reset_index()
print (df2)
    Col1            Col2            Col3
0  'aa1'  ['bb1', 'bb3']  ['cc1', 'cc3']
1  'aa2'           'bb2'           'cc2'
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252