1

I have two data frames.

df1 = pd.DataFrame({'Ver' : [2,2,2], 
                   'SN' : [1,1,1], 
                   'Split' : [AA,AA,AA] 
                   'Quad' : [3,3,4] 
                   'Channel' : [1,2,0]
                   'Mean' : [1,2,3] 
                   'Other' : [10,10,10]
                   'Mean2' : [1,2,3] 
                   'Other2' : [10,10,10]  
})

df1 = pd.DataFrame({'Ver' : [2,2,2], 
                   'SN' : [1,1,1], 
                   'Split' : [AA,AA,AA] 
                   'Quad' : [3,3,4] 
                   'Channel' : [3,2,0]
                   'Mean' : [4,5,6] 
                   'Other' : [10,10,10]  
                   'Mean2' : [4,5,6] 
                    'Other2' : [10,10,10]  
})

i'd like to subtract the mean column in df1 from the mean column in df2 and do this only for elements whose values match in columns 'SN', 'Quad', 'Channel'. if those values don't match up, i'd like to "throw out" the data. I'd like to do the same for the mean2 column. i'd like assign the resultant data frame to a new dataframe. is there an easy/straightforward way to do this in pandas?

i should end up with :

resultant_df= pd.DataFrame({'Ver' : [2,2], 
                   'SN' : [1,1], 
                   'Split' : [AA,AA] 
                   'Quad' : [3,4] 
                   'Channel' : [2,0]
                   'Mean' : [-3,-3] 
                   'Other' : [,10,10]  
                   'Mean2' : [-3,-3] 
                    'Other2' : [10,10]  
})              

1 Answers1

0
In [70]: df1 = pd.DataFrame({ 'SN':[1,1,1], 'Quad':[3,3,4], 
    ...:                      'Channel':[1,2,0], 'Mean':[1,2,3] })

In [71]: df2 = pd.DataFrame({ 'SN':[1,1,1], 'Quad':[3,3,4], 
    ...:                      'Channel':[3,2,0], 'Mean':[4,5,6] })

In [72]: results = df1.merge( df2, on=['Channel','Quad','SN'] )

In [73]: results['Mean'] = results.Mean_x - results.Mean_y

In [74]: results[ list(df1.columns) ]
Out[74]: 
   Channel  Mean  Quad  SN
0        2    -3     3   1
1        0    -3     4   1
JohnE
  • 29,156
  • 8
  • 79
  • 109
  • Thank @JohnE! one more issue is that the number of rows between my dataframes are not the same i.e. df1 may have more rows than df2. is there anyway to get around that? I had no idea we could use boolean evaluations like this. Your answer was informative – user3416986 Mar 12 '15 at 19:20
  • thanks for the tip. I tried set_index() and I get the following error message (same one as before): Series lengths must match to compare. this occurs when i set the conditions for comparison, similar to your bool = ((df1.Channel == df2.Channel) & (df1.Quad == df2.Quad)) line – user3416986 Mar 13 '15 at 08:48
  • Ah, yeah, I just re-wrote the answer using merge(). That's a better way to do what you're trying to do. – JohnE Mar 13 '15 at 12:13
  • Thanks @JohnE--this solved my problem nicely; I also learned a thing or two in the process. – user3416986 Mar 13 '15 at 23:56