pandas loc to check if value contains any of several words or a string

Question

I have a pandas DataFrame with a column containing strings.

(I take the example from Check if string is in a pandas dataframe)

import pandas as pd

BabyDataSet = [
    ('Bob and martin and Andrew', 968),
    ('Jessica and julia and anthony', 155),
    ('Mary and john', 77),
    ('John', 578),
    ('Mel and diana', 973),
    ('martin bob diana and Andrew', 968)
]

a = pd.DataFrame(data=BabyDataSet, columns=['Names', 'Births'])

                           Names  Births
0      Bob and martin and Andrew     968
1  Jessica and julia and anthony     155
2                  Mary and john      77
3                           John     578
4                  Mel and diana     973
5    martin bob diana and Andrew     968

Giving a string like "martin andrew bob", I would like to filter the DataFrame to get the subset with the rows containing in the name all the words (in any order and case) of the string.

Is there a better way that using a for loop adding the masks as lists of booleans? Because this solution seems cumbersome to me.

score 1 · Accepted Answer · answered Oct 25 '21 at 16:01

This is my suggestion:

my_str = 'martin andrew bob'

a[a['Names'].str.lower()
            .str.split()
            .apply(set(my_str.lower().split()).issubset)
 ].reset_index(drop=True)

Output:

                          Names  Births
0     Bob and martin and Andrew     968
1   martin bob diana and Andrew     968

I'm adding lower() function to my_str, but if you're sure that string is always given in lowercase you can skip it.

pandas loc to check if value contains any of several words or a string

1 Answers1