Check if value from one dataframe exists in another dataframe

Question

I have 2 dataframes.

Df1 = pd.DataFrame({'name': ['Marc', 'Jake', 'Sam', 'Brad']
Df2 = pd.DataFrame({'IDs': ['Jake', 'John', 'Marc', 'Tony', 'Bob']

I want to loop over every row in Df1['name'] and check if each name is somewhere in Df2['IDs'].

The result should return 1 if the name is in there, 0 if it is not like so:

Marc  1 
Jake  1
Sam   0 
Brad  0

Thank you.

For those with the same question, who just want the `Boolean` (`True` or `False`), don't use `.astype(int)`. Just use `df1[col].isin(df2[col])` — Trenton McKinney, Jan 14 '21 at 21:24

piRSquared · Accepted Answer · 2018-05-21T12:45:46.027

79

Use isin

Df1.name.isin(Df2.IDs).astype(int)

0    1
1    1
2    0
3    0
Name: name, dtype: int32

Show result in data frame

Df1.assign(InDf2=Df1.name.isin(Df2.IDs).astype(int))

   name  InDf2
0  Marc      1
1  Jake      1
2   Sam      0
3  Brad      0

In a Series object

pd.Series(Df1.name.isin(Df2.IDs).values.astype(int), Df1.name.values)

Marc    1
Jake    1
Sam     0
Brad    0
dtype: int32

edited May 21 '18 at 12:45

answered May 21 '18 at 12:40

piRSquared

285,575
57
475
624

1

Perfect. Thank you – toceto May 21 '18 at 13:39
Is it compulsory to include the data type? – Yonela Nuba Dec 04 '19 at 13:58
I had such a problem with checking if value in dataframe exists in series or other dataframe! I was getting the error 'Series' objects are mutable, thus they cannot be hashed, and no matter how I created the series, it was never immutable. This was the only thing that worked. I couldn't however, get the assign. to create a new column. No error created, but no column. Df1["InDf2"] = Df1.name.isin... did create the column. Thanks! – DrWhat Mar 05 '21 at 12:05

score 17 · Answer 2 · answered May 21 '18 at 12:42

17

This should do it:

Df1 = Df1.assign(result=Df1['name'].isin(Df2['IDs']).astype(int))

answered May 21 '18 at 12:42

zipa

27,316
6
40
58

This is the solution, if you want to have it in your dataframe, piRSquared solutions only print the results. – PV8 Dec 09 '19 at 10:58
how can I add 1 more condition to above code. For example if there are some values in DF1['name'] for which I want to skip the check @zipa – yoyo Jan 28 '22 at 20:23

score 14 · Answer 3 · answered May 21 '18 at 14:03

14

By using merge

s=Df1.merge(Df2,left_on='name',right_on='IDs',how='left')
s.IDs=s.IDs.notnull().astype(int)
s
Out[68]: 
   name  IDs
0  Marc    1
1  Jake    1
2   Sam    0
3  Brad    0

answered May 21 '18 at 14:03

BENY

317,841
20
164
234

score 9 · Answer 4 · answered May 21 '18 at 12:40

9

This is one way. Convert to set for O(1) lookup and use astype(int) to represent Boolean values as integers.

values = set(Df2['IDs'])

Df1['Match'] = Df1['name'].isin(values).astype(int)

answered May 21 '18 at 12:40

jpp

159,742
34
281
339

Check if value from one dataframe exists in another dataframe

4 Answers4

Linked

Related