2

I have a pandas DataFrame with a column containing strings.

(I take the example from Check if string is in a pandas dataframe)

import pandas as pd

BabyDataSet = [
    ('Bob and martin and Andrew', 968),
    ('Jessica and julia and anthony', 155),
    ('Mary and john', 77),
    ('John', 578),
    ('Mel and diana', 973),
    ('martin bob diana and Andrew', 968)
]

a = pd.DataFrame(data=BabyDataSet, columns=['Names', 'Births'])
                           Names  Births
0      Bob and martin and Andrew     968
1  Jessica and julia and anthony     155
2                  Mary and john      77
3                           John     578
4                  Mel and diana     973
5    martin bob diana and Andrew     968

Giving a string like "martin andrew bob", I would like to filter the DataFrame to get the subset with the rows containing in the name all the words (in any order and case) of the string.

Is there a better way that using a for loop adding the masks as lists of booleans? Because this solution seems cumbersome to me.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
JFerro
  • 3,203
  • 7
  • 35
  • 88

1 Answers1

1

This is my suggestion:

my_str = 'martin andrew bob'

a[a['Names'].str.lower()
            .str.split()
            .apply(set(my_str.lower().split()).issubset)
 ].reset_index(drop=True)

Output:

                          Names  Births
0     Bob and martin and Andrew     968
1   martin bob diana and Andrew     968

I'm adding lower() function to my_str, but if you're sure that string is always given in lowercase you can skip it.

Arkadiusz
  • 1,835
  • 1
  • 8
  • 15