0

I'm having a problem with python pandas where I want to filter a dataframe if a specific column contains any of the values for a specific key in my dictionary.

In  [1]: import pandas as pd
         data = {'col1': ['123apple645', '654banana213', '987orange815'], 
                 'col2': ['345mango987', '159peach357', '852apple258'], 
                 'col3':['654apple789', '324peach156', '358grapes854']}
         df = pd.DataFrame(data=data)
         dictionary = {'Mary':['apple', 'peach'], 'John':['peach', 'grapes']}

Out [1]:    col1            col2            col3
        0   123apple645     345mango987     654apple789
        1   654banana213    159peach357     324peach156
        2   987orange815    852apple258     358grapes854

I want to filter column 2 if it contains any of the values in the dictionary for key "John". So the result should only return index 1 because that is the only row with a value that contains any of the values corresponding to John's key.

In  [2]: ???

Out [2]:    col1            col2            col3
        1   654banana213    159peach357     324peach156

My attempt was to use the .str.contains method. But this gives me an "unhashable type: 'list'" error.

Filtered_df = df[df['col2'].str.contains(dictionary['John'])]
anothermh
  • 9,815
  • 3
  • 33
  • 52

1 Answers1

0

I found an answer from here: Pandas filtering for multiple substrings in series

The solution is to first concatenate the search list using the or ("|") operator.

search_list = '|'.join(dictionary['John'])

Then to apply this as the filter criteria.

df[df['col2'].str.contains(search_list)]