Check if string is in a pandas dataframe

Question

I would like to see if a particular string exists in a particular column within my dataframe.

I'm getting the error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

import pandas as pd

BabyDataSet = [('Bob', 968), ('Jessica', 155), ('Mary', 77), ('John', 578), ('Mel', 973)]

a = pd.DataFrame(data=BabyDataSet, columns=['Names', 'Births'])

if a['Names'].str.contains('Mel'):
    print ("Mel is there")

score 179 · Accepted Answer · answered Jun 19 '15 at 20:30

179

a['Names'].str.contains('Mel') will return an indicator vector of boolean values of size len(BabyDataSet)

Therefore, you can use

mel_count=a['Names'].str.contains('Mel').sum()
if mel_count>0:
    print ("There are {m} Mels".format(m=mel_count))

Or any(), if you don't care how many records match your query

if a['Names'].str.contains('Mel').any():
    print ("Mel is there")

answered Jun 19 '15 at 20:30

Uri Goren

13,386
6
58
110

3

If there's NaN values in `a['Names']`, use the `na` parameter of the `contains()` function. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html – Sander Vanden Hautte Feb 16 '19 at 09:22
1

Gotcha number 2: `str.contains('Mel')` matches on every substring of every row in dataframe column. So `ABCMelABC` == `Mel`. – Eric Leschinski May 31 '21 at 16:43
1

This answer is incorrect & misleading since you are checking if 'Mel' is contained in any of the string in the column e.g. 'hi Mel' in the column will also evaluate to true whereas an exact match of the string is required – umar Jul 22 '21 at 08:14

score 40 · Answer 2 · edited Dec 01 '22 at 14:36

40

You should use any()

In [98]: a['Names'].str.contains('Mel').any()
Out[98]: True

In [99]: if a['Names'].str.contains('Mel').any():
   ....:     print("Mel is there")
   ....:
Mel is there

a['Names'].str.contains('Mel') gives you a series of bool values

In [100]: a['Names'].str.contains('Mel')
Out[100]:
0    False
1    False
2    False
3    False
4     True
Name: Names, dtype: bool

edited Dec 01 '22 at 14:36

Oren

4,711
4
37
63

answered Jun 19 '15 at 18:06

Zero

74,117
18
147
154

1

If i want to check whether either of the words exist a['Names'].str.contains("Mel|word_1|word_2") works. Can you please suggest something for 'and' condition. I want to check if all the words in my list exist in each row of dataframe – Syed Md Ismail Mar 12 '21 at 13:47

score 20 · Answer 3 · edited May 31 '21 at 16:47

20

OP meant to find out whether the string 'Mel' exists in a particular column, not contained in any string in the column. Therefore the use of contains is not needed, and is not efficient.

A simple equals-to is enough:

df = pd.DataFrame({"names": ["Melvin", "Mel", "Me", "Mel", "A.Mel"]})

mel_count = (df['names'] == 'Mel').sum() 
print("There are {num} instances of 'Mel'. ".format(num=mel_count)) 
 
mel_exists = (df['names'] == 'Mel').any() 
print("'Mel' exists in the dataframe.".format(num=mel_exists)) 

mel_exists2 = 'Mel' in df['names'].values 
print("'Mel' is in the dataframe: " + str(mel_exists2))

Prints:

There are 2 instances of 'Mel'. 
'Mel' exists in the dataframe.
'Mel' is in the dataframe: True

edited May 31 '21 at 16:47

Eric Leschinski

146,994
96
417
335

answered Nov 08 '19 at 17:35

meizy

339
2
4

2

a similar solution: (a['Names'].eq('Mel')).any() – ivegotaquestion Dec 05 '19 at 15:28
This is the most accurate answer – thentangler Jul 29 '21 at 01:32
Why does one have to go down to numpy simply to check if a string is contained in a Series of strings? (like 'Mel' in df['names'].values). Seems contra-productive. I would expect `'Mel' in df['names']` to work? – K.-Michael Aye Feb 09 '22 at 01:28

score 9 · Answer 4 · answered Feb 05 '20 at 13:15

9

I bumped into the same problem, I used:

if "Mel" in a["Names"].values:
    print("Yep")

But this solution may be slower since internally pandas create a list from a Series.

answered Feb 05 '20 at 13:15

Christian Pao.

484
4
13

it works for multiple string in that columns, thanks – PyBoss Oct 16 '21 at 05:21

score 4 · Answer 5 · answered Jun 04 '20 at 21:10

If there is any chance that you will need to search for empty strings,

    a['Names'].str.contains('')

will NOT work, as it will always return True.

Instead, use

    if '' in a["Names"].values

to accurately reflect whether or not a string is in a Series, including the edge case of searching for an empty string.

score 4 · Answer 6 · answered Jun 10 '21 at 12:17

4

For case-insensitive search.

a['Names'].str.lower().str.contains('mel').any()

answered Jun 10 '21 at 12:17

Hayat

1,539
4
18
32

score 2 · Answer 7 · edited Jun 28 '20 at 19:35

2

Pandas seem to be recommending df.to_numpy since the other methods still raise a FutureWarning: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy

So, an alternative that would work int this case is:

b=a['Names']
c = b.to_numpy().tolist()
if 'Mel' in c:
     print("Mel is in the dataframe column Names")

edited Jun 28 '20 at 19:35

β.εηοιτ.βε

33,893
13
69
83

answered Jun 28 '20 at 17:44

RusRus

31
2

score 2 · Answer 8 · answered Feb 03 '22 at 16:28

2

import re
s = 'string'

df['Name'] = df['Name'].str.findall(s, flags = re.IGNORECASE)

#or
df['Name'] = df[df['Name'].isin(['string1', 'string2'])]

answered Feb 03 '22 at 16:28

janhavi kulkarni

21
3

score 1 · Answer 9 · edited Jan 20 '22 at 15:22

1

import pandas as pd

(data_frame.col_name=='str_name_to_check').sum()

edited Jan 20 '22 at 15:22

camille

16,432
18
38
60

answered Jan 16 '22 at 20:33

Harshit Sharma

11
1

1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Keegan Murphy Jan 16 '22 at 21:00

SaNa · Answer 10 · 2021-07-26T04:12:04.163

0

If you want to save the results then you can use this:

a['result'] = a['Names'].apply(lambda x : ','.join([item for item in str(x).split() if item.lower() in ['mel', 'etc']]))

edited Jul 26 '21 at 04:12

answered Jul 26 '21 at 04:02

SaNa

333
1
3
13

Shahir Ansari · Answer 11 · 2019-07-01T08:08:51.697

-1

You should check the value of your line of code like adding checking length of it.

if(len(a['Names'].str.contains('Mel'))>0):
    print("Name Present")

edited Jul 01 '19 at 08:08

answered Jul 01 '19 at 07:12

Shahir Ansari

1,682
15
21

Check if string is in a pandas dataframe

11 Answers11

Linked

Related