Let's say, I have dataframe df
with column name as news_text
,
news_text
lebron james is the great basketball player.
leonardo di caprio has won the oscar for best actor
avatar was directed by steven speilberg.
ronaldo has resigned from manchester united.
argentina beats france in fifa world cup 2022.
joe biden has won the president elections.
2026 fifa WC will be host by canada,mexico and usa combined.
and a large dictionary with hundreds of keys, something like,
{'category_1': ['lebron james', 'oscar', 'leonardo dicaprio'], 'category_2': ['basketball', 'steven speilberg','manchester united'],
'category_3': ['ronaldo', 'argentina','world cup']...so on}
All, I want to perform the exact keywords matching between the dictionary values (which consists list of keywords) and df['news_text']
. Once keywords will be matched, correponding dictionary keys will be assigned to new column mapped_category
in the form of list and if no keyword found in any of keyword list then column value will be NA
. The final output will be something like,
news_text mapped_category
lebron james is the great basketball player. ['category_1', 'category_2']
leonardo di caprio has won the oscar for best actor ['category_1','category_1']
avatar was directed by steven speilberg. ['category_2']
ronaldo has resigned from manchester united. ['category_2','category_3']
argentina beats france in fifa world cup 2022. ['category_3','category_3]
joe biden has won the president elections. NA
2026 fifa WC will be host by canada,mexico and usa combined. NA