2

I have a dataframe with an email column. I'm trying delete all records from this dataframe where the email address exists in a second dataframe.

In SQL this would be

delete from df1 where email in (select email from df2)

Thanks

andyabel
  • 335
  • 4
  • 15
  • Are the dataframes indexed identically? Or do they share a unique identifier? We'll probably end up using the `pd.drop()` command, but it depends on how the two are related. – 3novak Jun 24 '17 at 02:08

3 Answers3

1

You can use ~isin

In [30]: df1
Out[30]: 
   pid
0    1
1    2
2    3
3    4

In [31]: df2
Out[31]: 
   pid
0    1
1    2

In [32]: df1=df1[~df1['pid'].isin(df2['pid'])]

In [33]: df1
Out[33]: 
   pid
2    3
3    4
danche
  • 1,775
  • 15
  • 22
1

You can do following using where:

import pandas as pd

df1 = pd.DataFrame({'email':['abc@mail.com','fgh@mail.com','cde@mail.com']})
df2 = pd.DataFrame({'email':['abc@mail.com']})

print(df1)

Output for df1:

          email
0  abc@mail.com
1  fgh@mail.com
2  cde@mail.com

Output for print(df2):

     email
 0  abc@mail.com

Now, using where:

df1 = df1.where(~df1.email.isin(df2.email)).dropna()
print(df1)

Output:

          email
1  fgh@mail.com
2  cde@mail.com
niraj
  • 17,498
  • 4
  • 33
  • 48
0

You can use isin with drop too. Using boolean indexing and .index will drop those records form the df1 dataframe. Borrowing open-source's setup

df1 = df1.drop(df1[df1.email.isin(df2.email)].index)

Output:

          email
1  fgh@mail.com
2  cde@mail.com
Scott Boston
  • 147,308
  • 15
  • 139
  • 187