0

I have an imaginary dataframe of such nature:

df = pd.DataFrame({
    'brand': ['Yum Yum', 'ByRice', 'LuxSoba', 'Indomie', 'Indomie'],
    'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
    'flavour': [chili, chicken, chili, beef, cheese]
})
df
    brand style  flavour
0  Yum Yum   cup     chili
1  ByRice   cup     chicken
2  LuxSoba   cup     chili
3  Indomie  pack    beef
4  Indomie  pack     cheese

My goal is to change dataframe in such manner, that all duplicate entries of brands are deleted, but if there are several flavours, they all are appended into one column, to the first entry. So dataframe should look like this:

    brand style  flavour
0  Yum Yum   cup     chili
1  ByRice   cup     chicken
2  LuxSoba   cup     chili
3  Indomie  pack    beef, cheese

I'm not sure how to approach this problem.

2 Answers2

1

You can do this:

df = pd.DataFrame({
    'brand': ['Yum Yum', 'ByRice', 'LuxSoba', 'Indomie', 'Indomie'],
    'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
    'flavour': ['chili', 'chicken', 'chili', 'beef', 'cheese']
})
df2 = df.groupby(['brand', 'style'])['flavour'].agg(lambda x: ', '.join(x)).reset_index()

Result:

enter image description here

TheEngineerProgrammer
  • 1,282
  • 1
  • 4
  • 9
0

You can use groupby_agg:

>>> df.groupby(['brand', 'style'], sort=False, as_index=False)['flavour'].agg(', '.join)
     brand style       flavour
0  Yum Yum   cup         chili
1   ByRice   cup       chicken
2  LuxSoba   cup         chili
3  Indomie  pack  beef, cheese
Corralien
  • 109,409
  • 8
  • 28
  • 52