I have one dataframe containing strings and one list of words that I want to remove from the dataframe. However, I would like to also keep the strings from the df which are entirely made up of words from the list.
Here is an example:
strings_variable |
---|
Avalon Toyota loan |
Blazer Chevrolet |
Suzuki Vitara sales |
Vauxhall Astra |
Buick Special car |
Ford Aerostar |
car refund |
car loan |
data = {'strings_variable': ['Avalon Toyota loan', 'Blazer Chevrolet', 'Suzuki Vitara sales', 'Vauxhall Astra', 'Buick Special car', 'Ford Aerostar', 'car refund', 'car loan']}
df = pd.DataFrame(data)
words_to_remove = ('car','sales','loan','refund')
The final output should look like this:
strings_variable |
---|
Avalon Toyota |
Blazer Chevrolet |
Suzuki Vitara |
Vauxhall Astra |
Buick Special |
Ford Aerostar |
car refund |
car loan |
data= {'strings_variable': ['Avalon Toyota', 'Blazer Chevrolet', 'Suzuki Vitara', 'Vauxhall Astra', 'Buick Special', 'Ford Aerostar', 'car refund', 'car loan']}
df = pd.DataFrame(data)
Note, the words that I want to remove are in addition to the car names however I would like to keep the rows where the strings are only made of words in words_to_remove
Here is my code (Python) so far:
def remove_words(df):
df = [word for words in df if word not in words_to_remove]
return df
strings_variable = strings_variable.apply(remove_words)
I hope it makes sense - thank you in advance!