I have done good analysis on SO and other forums and I have seen solutions on how to handle plurals but here it is the case of handling if the words are passed from excel.
I have a long list of keywords and I am passing that list to my regex like below:
df = pd.read_excel('\\Keywords.xlsx', sheet_name=0)
keyword_list = df['Keyword_List'].tolist()
keywords_regex =(r'(({0})\b)'.format('|'.join(keyword_list)))
I have to keep \b
in the end because I have words like "Meet" and don't want words like "Meeting" to be matched.
I have a huge text paragraph and I want to check how many of the words in my keyword list occurs including the plurals. So, if the paragraph contains a word like "Boy" and "Boys" both I want both. Currently the below code is working only for singular:
matches = re.findall(keywords_regex, text, re.IGNORECASE) ## text is the long paragraph
I can always write plural forms of words in the excel to get the match but I am looking of there is any we can handle at regex or python level only