How can i remove all duplicated elements from list, but updating first entrys value in other column, if not the same

Question

I have an imaginary dataframe of such nature:

df = pd.DataFrame({
    'brand': ['Yum Yum', 'ByRice', 'LuxSoba', 'Indomie', 'Indomie'],
    'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
    'flavour': [chili, chicken, chili, beef, cheese]
})
df
    brand style  flavour
0  Yum Yum   cup     chili
1  ByRice   cup     chicken
2  LuxSoba   cup     chili
3  Indomie  pack    beef
4  Indomie  pack     cheese

My goal is to change dataframe in such manner, that all duplicate entries of brands are deleted, but if there are several flavours, they all are appended into one column, to the first entry. So dataframe should look like this:

    brand style  flavour
0  Yum Yum   cup     chili
1  ByRice   cup     chicken
2  LuxSoba   cup     chili
3  Indomie  pack    beef, cheese

I'm not sure how to approach this problem.

Try groupby + agg + ','.split as the function – rafaelc Apr 06 '23 at 13:22 — rafaelc, Apr 06 '23 at 13:22

score 1 · Answer 1 · answered Apr 06 '23 at 13:24

You can do this:

df = pd.DataFrame({
    'brand': ['Yum Yum', 'ByRice', 'LuxSoba', 'Indomie', 'Indomie'],
    'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
    'flavour': ['chili', 'chicken', 'chili', 'beef', 'cheese']
})
df2 = df.groupby(['brand', 'style'])['flavour'].agg(lambda x: ', '.join(x)).reset_index()

Result:

score 0 · Answer 2 · answered Apr 06 '23 at 13:26

You can use groupby_agg:

>>> df.groupby(['brand', 'style'], sort=False, as_index=False)['flavour'].agg(', '.join)
     brand style       flavour
0  Yum Yum   cup         chili
1   ByRice   cup       chicken
2  LuxSoba   cup         chili
3  Indomie  pack  beef, cheese

How can i remove all duplicated elements from list, but updating first entrys value in other column, if not the same

2 Answers2