I want to match similar strings with same significant word.
Problem:
I have two files one master and one input file. I have to iterate through the input file and find similar record from master. Currently I have indexed the master file in ElasticSearch and try to get similar records from ElasticSearch but since the Master contains of many similar records it return many records and finding the appropriate one from them is the problem.
Sample Input record:
1. H1 Bulbs Included
Sample Output From ElasticSearch:
1. Included H1 [Correct One]
2. H7 Bulbs Included
3. H8 Bulbs Provided
4. H1 not Included[Should not match this]
I have tried using POS tagger to get the important terms but it does not work well.
POS Tagger Output:
1. H1/NNP Included/NNP
2. H8/NNP Bulbs/NNP Provided/NNP
How to proceed with this?
Edit:
In the above example H1 is the significant term
Sample Input Record:
1. H1 Bulbs included
Sample Output from ElasticSearch:
1. H2 Bulbs Included
2. H3 Bulbs Included
3. H1 [Correct One]
Initially I need to identify the Significant word. There is currently no pattern in the significant word.
i.e.)
1.H1 bulbs [H1]
2.9600 added [9600]
3.It has H8 [H8]
4.1/2 wire for 4500 bulb [4500]