1

I need rows from df1 doesn't exist in df2 based on 3 columns [Time1, ID1, Order1]. I need df3 has rows of df1 don't exist on df2 Note: Time1 is in datetime format,

Example

input

df1

Time1 ID1 Order1
12/14/2022 6:10:32 PM X A
9/15/2022 2:45:57 AM Y B
9/15/2022 11:08:26 AM Z C

df2

Time2 ID2 Order2
12/14/2022 6:15:35 PM X A
12/14/2022 6:00:35 PM Y B
9/15/2022 2:45:57 AM Y B

output df3

Time1 ID1 Order1
12/14/2022 6:10:32 PM X A
9/15/2022 11:08:26 AM Z C

1 Answers1

1

Use a merge with indicator and indexing:

df3 = df1.loc[df1.merge(df2, left_on=['Time1', 'ID1', 'Order1'],
                             right_on=['Time2', 'ID2', 'Order2'],
                        how='left', indicator=True)
                 .query('_merge == "left_only"').index
             ]

Or:

df3 = df1.loc[df1.merge(df2, left_on=['Time1', 'ID1', 'Order1'],
                             right_on=['Time2', 'ID2', 'Order2'],
                        how='left', indicator=True)
                 ['_merge'].eq('left_only')
             ]

Output:

                   Time1 ID1 Order1
0  12/14/2022 6:10:32 PM   X      A
2  9/15/2022 11:08:26 AM   Z      C
mozway
  • 194,879
  • 13
  • 39
  • 75