0

I need to search a df column and return all substrings from a list.

myList= ['a cat', 'the dog', 'a cow']

example df
'col A'
there was a cat with the dog
the cow was brown
the dog was sick

this splits the words in the list and only returns single words

df['col B'] = df['col A'].apply(lambda x: ';'.join([word for word in x.split() if word in (myList)]))

also tried to add in an np any...

df['col B'] = df['col A'].apply(lambda x: ';'.join(np.any(word for word in df['col A'] if word in (myList))))

need to return

'col B'
a cat;the dog
NaN
the dog
leo
  • 333
  • 4
  • 12

2 Answers2

1

You may

s = df.col.str.extractall(f'({"|".join(myList)})')
res = s.groupby(s.index.get_level_values(0))[0].agg(';'.join)
df.loc[res.index, 'new'] = res

                            col            new
0  there was a cat with the dog  a cat;the dog
1             the cow was brown            NaN
2              the dog was sick        the dog
rafaelc
  • 57,686
  • 15
  • 58
  • 82
1

this should work, you were close:

import numpy as np

df['col B'] = df['col A'].apply(lambda x: ';'.join([m for m in myList if m in x])).replace('',np.nan)

result:

                          col A          col B
0  there was a cat with the dog  a cat;the dog
1             the cow was brown            NaN
2              the dog was sick        the dog
Derek Eden
  • 4,403
  • 3
  • 18
  • 31