2

Here is an extract of my dataframe :

ID LU MA ME JE VE SA DI
200 B B B
201 C C C C C
211 A
211 D D D
211 B
213 A A
216 K K K K
216 K
217 B B B B B

I have some rows with same ID and want to "merge" them into only one row while completing them. Here is an example of what I want to have as a result :

ID LU MA ME JE VE SA DI
200 B B B
201 C C C C C
211 A D D D B
213 A A
216 K K K K K
217 B B B B B

I'm new to pandas dataframes and have try to use drop_duplicates method but I need something different because of the restriction on keep parameters. Also the dataframe is sorted by ID.

bptste
  • 25
  • 4

2 Answers2

2

If there is only one non empty value per groups use:

df = df.replace('',np.nan).groupby('ID', as_index=False).first().fillna('')

If possible multiple values and need unique values in original order use lambda function:

print (df)
    ID LU MA ME JE VE SA DI
0  201  B     C  B         
1  201  C  C  C  B  C    


f = lambda x: ','.join(dict.fromkeys(x.dropna()).keys())
df = df.replace('',np.nan).groupby('ID', as_index=False).agg(f)
print (df)
    ID   LU MA ME JE VE SA DI
0  201  B,C  C  C  B  C      
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

This could be treated as a pivot. You'd need to melt the df first then pivot:

(df.melt(id_vars='ID')
   .dropna()
   .pivot(index='ID',columns='variable',values='value')
   .fillna('')
   .rename_axis(None, axis=1)
   .reset_index()
)
Chris
  • 15,819
  • 3
  • 24
  • 37