0

I have a dataframe with a column 'text".

I want to filter out everything else but rows in a text column, containing certain strings. And my list of words is long. For example, crime, taxation, etc.

This works for one word:

data_cleaned = data_cleaned.loc[data_cleaned['text'].str.contains('population')].reset_index(drop = True)

How to add multiple words, having not only population, but crime etc.

I see answers like this, but it does not work for me.

UPD.

My full list of words looks like this

key_words = ['population'
                          'migrarion'
                          'crime',
                          'safety',
                          'taxation',
                          'taxes',
                          'weather', 
                          'climate',
                          'opportunities',
                          'employment',
                          'unemployment',
                          'cultural life',
                          'services',
                          'jobs',
                          'economic growth',
                          'economic decline',
                          'pollution',
                          'environment',
                          'health',
                          'insurance',
                          'education',
                          'natural disaster',
                          'retirement']
Anakin Skywalker
  • 2,400
  • 5
  • 35
  • 63

1 Answers1

1

Assuming that lst is the list of strings the following would work:

def selector(s):
    for w in lst:
        if w in s:
            return True
    return False

data_cleaned = data_cleaned.loc[data_cleaned['text'].apply(selector)]
bb1
  • 7,174
  • 2
  • 8
  • 23