0

I have two date frames as below:

import pandas as pd
df1 = pd.DataFrame({'serialNo':['aaaa','bbbb','cccc','ffff','aaaa','bbbb','aaaa'],
               'Name':['Sayonti','Ruchi','Tony','Gowtam','Toffee','Tom','Sayonti'],
               'testName':   [4402, 3747 ,5555,8754,1234,9876,3602],
               'moduleName':   ['singing', 'dance','booze', 'vocals','drama','paint','singing'],
               'endResult': ['WARNING', 'FAILED', 'WARNING', 'FAILED','WARNING','FAILED','WARNING'],
               'Date':['2018-10-5','2018-10-6','2018-10-7','2018-10-8','2018-10-9','2018-10-10','2018-10-8'],
               'Time_df1':['23:26:39','22:50:31','22:15:28','21:40:19','21:04:15','20:29:11','19:54:03']})

df2 = pd.DataFrame({'serialNo':['aaaa','bbbb','aaaa','ffff','xyzy','aaaa'],
               'Food':['Strawberry','Coke','Pepsi','Nuts','Apple','Candy'],
               'Work':   ['AP', 'TC','OD', 'PU','NO','PM'],
               'Date':['2018-10-4','2018-10-6','2018-10-5','2018-10-7','2018-10-5','2018-10-10'],
               'Time_df2':['09:00:00','10:00:00','11:00:00','12:00:00','13:00:00','14:00:00']
               })

Now I have merged the two frames as below:

df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])
result = pd.merge(df1,df2,on=['serialNo'],how='inner')

I want to group by

result = result[result.Date_x.sub(result.Date_y).dt.days.between(0,3)]
result.drop(['Date_x','Date_y','Time_df1','Time_df2'],axis=1,inplace=True)
result = result.groupby(['serialNo'])['Food'].apply(','.join).reset_index()

But I want the out put to look like this:

output = pd.DataFrame({'serialNo':['aaaa','bbbb','ffff'],
               'Name':['Sayonti,Sayonti,Sayonti','Ruchi','Gowtam'],
               'testName':   ['4402,4402,3602','3747','8754'],
               'moduleName':   ['singing,singing,singing', 'dance','vocals'],
               'endResult': ['WARNING,WARNING,WARNING','FAILED','FAILED'],
               'Food':['Strawberry,Pepsi,Pepsi','Coke','Nuts'],
               'Work':['AP,OD,OD','TC','PU']})

How do I achieve this? I basically need to figure out how to .apply(','.join) for multiple columns together?

sayo
  • 207
  • 4
  • 18
  • According to the `output` you define, it seems that 'SerialNo' is not the only column you want to groupby, otherwise you would not have 2 rows with `'aaaa'`, can you explain? – Ben.T Nov 08 '18 at 20:57
  • Yes @Ben.T you are correct the 'aaaa' should all be one row I have changed my question accordingly. – sayo Nov 08 '18 at 22:04

1 Answers1

2

You can use either:

result.groupby('serialNo').agg(list) #To get a list of values

Output:

                                 Name            testName  \
serialNo                                                    
aaaa      [Sayonti, Sayonti, Sayonti]  [4402, 4402, 3602]   
bbbb                          [Ruchi]              [3747]   
ffff                         [Gowtam]              [8754]   

                           moduleName                    endResult  \
serialNo                                                             
aaaa      [singing, singing, singing]  [WARNING, WARNING, WARNING]   
bbbb                          [dance]                     [FAILED]   
ffff                         [vocals]                     [FAILED]   

                                Food          Work  
serialNo                                            
aaaa      [Strawberry, Pepsi, Pepsi]  [AP, OD, OD]  
bbbb                          [Coke]          [TC]  
ffff                          [Nuts]          [PU]  

Or

result.groupby('serialNo').agg(lambda x: ', '.join(x.astype(str))) #to get comma separated strings

Output:

                               Name          testName  \
serialNo                                                
aaaa      Sayonti, Sayonti, Sayonti  4402, 4402, 3602   
bbbb                          Ruchi              3747   
ffff                         Gowtam              8754   

                         moduleName                  endResult  \
serialNo                                                         
aaaa      singing, singing, singing  WARNING, WARNING, WARNING   
bbbb                          dance                     FAILED   
ffff                         vocals                     FAILED   

                              Food        Work  
serialNo                                        
aaaa      Strawberry, Pepsi, Pepsi  AP, OD, OD  
bbbb                          Coke          TC  
ffff                          Nuts          PU  
Scott Boston
  • 147,308
  • 15
  • 139
  • 187