I have a list of entities which look something like this:
["Bluechoice HMO/POS", "Pathway X HMO/PPO", "HMO", "Indemnity/Traditional Health Plan/Standard"]
It's not the exhaustive list, there are other similar entries.
I want to extract these entities, if present, from a text file (with over 30 pages of information). The crunch here is that this text file is generated using OCR and thus might not contain the exact entries. That is, for example, it might have:
"Out of all the entries the user made, BIueChoise HMOIPOS is the most prominent"
Notice the spelling mistake in "BIueChoise HMOIPOS" w.r.t. "Bluechoice HMO/POS".
I want those entities which are present in the text file even if the corresponding words do not match perfectly.
Any help, be it an algorithm or an approach, is welcomed. Thanks a lot!