0

Need to search a string column values from a list of strings. The strings in the search list are only a substring of the values in the column

df = pd.DataFrame(data={'text':['abc def', 'def ghi', 'poi opo', 'aswwf', 'abcs  sd'], 'id':[1, 2, 3, 4, 5]})

Out [1]:
    text     id
0   abc def  1
1   def ghi  2
2   poi opo  3
3   aswwf    4
4   abcs sd  5

search = ['abc', 'poi']

Required:


Out [2]:
    text     id
0   abc def  1
1   poi opo  3
2   abcs sd  5
keshav
  • 146
  • 10

2 Answers2

2

Use Series.str.contains with boolean indexing - all values of list are joined by | for regex OR:

pat = '|'.join(search)
df1 = df[df['text'].str.contains(pat)]
print (df1)
       text  id
0   abc def   1
2   poi opo   3
4  abcs  sd   5
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

@jezrael'answer is great, provided the patterns to search contain no special characters like |. But you can process every element at a time and do a global or at the end. If you want to search strings containing special characters, you can use:

df[pd.concat([df.text.str.contains(i, regex=False) for i in search], axis=1).any(axis=1)]

it gives as expected:

       text  id
0   abc def   1
2   poi opo   3
4  abcs  sd   5
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252