126

I would like to see if a particular string exists in a particular column within my dataframe.

I'm getting the error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

import pandas as pd

BabyDataSet = [('Bob', 968), ('Jessica', 155), ('Mary', 77), ('John', 578), ('Mel', 973)]

a = pd.DataFrame(data=BabyDataSet, columns=['Names', 'Births'])

if a['Names'].str.contains('Mel'):
    print ("Mel is there")
Uri Goren
  • 13,386
  • 6
  • 58
  • 110
user2242044
  • 8,803
  • 25
  • 97
  • 164

11 Answers11

179

a['Names'].str.contains('Mel') will return an indicator vector of boolean values of size len(BabyDataSet)

Therefore, you can use

mel_count=a['Names'].str.contains('Mel').sum()
if mel_count>0:
    print ("There are {m} Mels".format(m=mel_count))

Or any(), if you don't care how many records match your query

if a['Names'].str.contains('Mel').any():
    print ("Mel is there")
Uri Goren
  • 13,386
  • 6
  • 58
  • 110
  • 3
    If there's NaN values in `a['Names']`, use the `na` parameter of the `contains()` function. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html – Sander Vanden Hautte Feb 16 '19 at 09:22
  • 1
    Gotcha number 2: `str.contains('Mel')` matches on every substring of every row in dataframe column. So `ABCMelABC` == `Mel`. – Eric Leschinski May 31 '21 at 16:43
  • 1
    This answer is incorrect & misleading since you are checking if 'Mel' is contained in any of the string in the column e.g. 'hi Mel' in the column will also evaluate to true whereas an exact match of the string is required – umar Jul 22 '21 at 08:14
40

You should use any()

In [98]: a['Names'].str.contains('Mel').any()
Out[98]: True

In [99]: if a['Names'].str.contains('Mel').any():
   ....:     print("Mel is there")
   ....:
Mel is there

a['Names'].str.contains('Mel') gives you a series of bool values

In [100]: a['Names'].str.contains('Mel')
Out[100]:
0    False
1    False
2    False
3    False
4     True
Name: Names, dtype: bool
Oren
  • 4,711
  • 4
  • 37
  • 63
Zero
  • 74,117
  • 18
  • 147
  • 154
  • 1
    If i want to check whether either of the words exist a['Names'].str.contains("Mel|word_1|word_2") works. Can you please suggest something for 'and' condition. I want to check if all the words in my list exist in each row of dataframe – Syed Md Ismail Mar 12 '21 at 13:47
20

OP meant to find out whether the string 'Mel' exists in a particular column, not contained in any string in the column. Therefore the use of contains is not needed, and is not efficient.

A simple equals-to is enough:

df = pd.DataFrame({"names": ["Melvin", "Mel", "Me", "Mel", "A.Mel"]})

mel_count = (df['names'] == 'Mel').sum() 
print("There are {num} instances of 'Mel'. ".format(num=mel_count)) 
 
mel_exists = (df['names'] == 'Mel').any() 
print("'Mel' exists in the dataframe.".format(num=mel_exists)) 

mel_exists2 = 'Mel' in df['names'].values 
print("'Mel' is in the dataframe: " + str(mel_exists2)) 

Prints:

There are 2 instances of 'Mel'. 
'Mel' exists in the dataframe.
'Mel' is in the dataframe: True
Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
meizy
  • 339
  • 2
  • 4
9

I bumped into the same problem, I used:

if "Mel" in a["Names"].values:
    print("Yep")

But this solution may be slower since internally pandas create a list from a Series.

Christian Pao.
  • 484
  • 4
  • 13
4

If there is any chance that you will need to search for empty strings,

    a['Names'].str.contains('') 

will NOT work, as it will always return True.

Instead, use

    if '' in a["Names"].values

to accurately reflect whether or not a string is in a Series, including the edge case of searching for an empty string.

baileyw
  • 71
  • 3
4

For case-insensitive search.

a['Names'].str.lower().str.contains('mel').any()
Hayat
  • 1,539
  • 4
  • 18
  • 32
2

Pandas seem to be recommending df.to_numpy since the other methods still raise a FutureWarning: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy

So, an alternative that would work int this case is:

b=a['Names']
c = b.to_numpy().tolist()
if 'Mel' in c:
     print("Mel is in the dataframe column Names")
β.εηοιτ.βε
  • 33,893
  • 13
  • 69
  • 83
RusRus
  • 31
  • 2
2
import re
s = 'string'

df['Name'] = df['Name'].str.findall(s, flags = re.IGNORECASE)

#or
df['Name'] = df[df['Name'].isin(['string1', 'string2'])]
1
import pandas as pd

(data_frame.col_name=='str_name_to_check').sum()
camille
  • 16,432
  • 18
  • 38
  • 60
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Keegan Murphy Jan 16 '22 at 21:00
0

If you want to save the results then you can use this:

a['result'] = a['Names'].apply(lambda x : ','.join([item for item in str(x).split() if item.lower() in ['mel', 'etc']]))
SaNa
  • 333
  • 1
  • 3
  • 13
-1

You should check the value of your line of code like adding checking length of it.

if(len(a['Names'].str.contains('Mel'))>0):
    print("Name Present")
Shahir Ansari
  • 1,682
  • 15
  • 21