Pandas equivalent of delete from dataframe where in (...)

Question

I have a dataframe with an email column. I'm trying delete all records from this dataframe where the email address exists in a second dataframe.

In SQL this would be

delete from df1 where email in (select email from df2)

Thanks

Are the dataframes indexed identically? Or do they share a unique identifier? We'll probably end up using the `pd.drop()` command, but it depends on how the two are related. — 3novak, Jun 24 '17 at 02:08

score 1 · Answer 1 · answered Jun 24 '17 at 02:27

1

You can use ~isin

In [30]: df1
Out[30]: 
   pid
0    1
1    2
2    3
3    4

In [31]: df2
Out[31]: 
   pid
0    1
1    2

In [32]: df1=df1[~df1['pid'].isin(df2['pid'])]

In [33]: df1
Out[33]: 
   pid
2    3
3    4

answered Jun 24 '17 at 02:27

danche

1,775
15
22

niraj · Answer 2 · 2017-06-24T03:39:35.927

You can do following using where:

import pandas as pd

df1 = pd.DataFrame({'email':['abc@mail.com','fgh@mail.com','cde@mail.com']})
df2 = pd.DataFrame({'email':['abc@mail.com']})

print(df1)

Output for df1:

          email
0  abc@mail.com
1  fgh@mail.com
2  cde@mail.com

Output for print(df2):

     email
 0  abc@mail.com

Now, using where:

df1 = df1.where(~df1.email.isin(df2.email)).dropna()
print(df1)

Output:

          email
1  fgh@mail.com
2  cde@mail.com

score 0 · Answer 3 · answered Jun 24 '17 at 02:51

You can use isin with drop too. Using boolean indexing and .index will drop those records form the df1 dataframe. Borrowing open-source's setup

df1 = df1.drop(df1[df1.email.isin(df2.email)].index)

Output:

          email
1  fgh@mail.com
2  cde@mail.com

Pandas equivalent of delete from dataframe where in (...)

3 Answers3