python how to check if a string is an element of a list of strings

Question

In python, how to check if a string is an element of a list of strings?

The example data I am working with is :

testData=pd.DataFrame({'value':['abc','cde','fgh']})

Then why the result of the following code is "False":

testData['value'][0] in testData['value']

Sorry the data will be stored as a Series containing individual strings in your sample df but is your real df data really a list of strings for each row? As that is fundamentally different — EdChum, Oct 28 '16 at 14:11
@EdChum answer is a good one. To help fix your original error, you simply need to check the values of testData['value'] so your last line will be 'testData['value'][0] in testData['value'].values' and you will get a True — A.Kot, Oct 28 '16 at 14:14
@EdChum, I guess the example data is a more accurate description of my problem. The fundamental difference you mentioned might be the thing I overlooked. — cone001, Oct 28 '16 at 14:16
actually `testData['value'][0] in testData['value']` I can't explain, somehow when the scalar value is the lhs it's somehow able to evaluate the `Series` array into a scalar boolean which is weird — EdChum, Oct 28 '16 at 14:19
@EdChum Whats the confusion. Changing `testData['value']` to `testData['value'].values` corrects the error — A.Kot, Oct 28 '16 at 15:03
@A.Kot the confusion is why `testData['value'][0] in testData['value']` gives `False` not that `testData['value'][0] in testData['value'].values` works because that will use `np.array.__contains__` which does what you expect whilst for pandas `Series` you're checking for membership of the index, see the bottom of my answer — EdChum, Oct 28 '16 at 15:05

EdChum · Accepted Answer · 2016-10-28T14:27:59.740

You can use the vectorised str.contains to test if a string is present/contained in each row :

In [262]:
testData['value'].str.contains(testData['value'][0])

Out[262]:
0     True
1    False
2    False
Name: value, dtype: bool

If you're after whether it's present in any row then use any:

In [264]:
testData['value'].str.contains(testData['value'][0]).any()

Out[264]:
True

OK to address your last question:

In [270]:
testData['value'][0] in testData['value']

Out[270]:
False

This is because pd.Series.__contains__ is implemented:

def __contains__(self, key):
    """True if the key is in the info axis"""
    return key in self._info_axis

If we look at what _info_axis actually is:

In [269]:
testData['value']._info_axis

Out[269]:
RangeIndex(start=0, stop=3, step=1)

Then we can see when we do 'abc' in testData['value'] we're really testing whether 'abc' is actually in the index which is why it returns False

Example:

In [271]:
testData=pd.DataFrame({'value':['abc','cde','fgh']}, index=[0, 'turkey',2])
testData

Out[271]:
       value
0        abc
turkey   cde
2        fgh

In [272]:
'turkey' in testData['value']

Out[272]:
True

We can see that is returns True now because we're testing if 'turkey' is present in the index

python how to check if a string is an element of a list of strings

1 Answers1