I have a list of company names that I want to match against a list of sentences and get the index start and end position if a keyword is present in any of the sentences.
I wrote the code for matching the keywords exactly but realized that names in the sentences won't always be an exact match. For example, my keywords list can contain Company One Two Ltd
but the sentences can be -
Company OneTwo Ltd won the auction
Company One Two Limited won the auction
The auction was won by Co. One Two Ltd
and other variations
Given a company name, I want to find out the index start and end position even if the company name in the sentence is not an exact match but a variation. Below is the code I wrote for exact matching.
def find_index(texts, target):
idxs = []
for i, each_sent in enumerate(texts):
add = [(m.start(0), m.end(0)) for m in re.finditer(target, each_sent)]
if len(add):
idxs.append([(i, m.start(0), m.end(0)) for m in re.finditer(target, each_sent)])
return idxs