I have a catalog of electronic products. I have them in a SQL DB in fields/columns like Title, Mfg Part Nr, UPC etc. I then crawl through external websites that list electronic products for e.g. Amazon. For most part this results in some HTML text, though I can figure out the Title for example. I need to compare if this HTML text (the result of a webpage on an external website) describes a product I have.
I understand that this comparison would not be exact i.e. I am not expecting this to correct 100% of the time. Is there anyway to do this?
While it would be difficult to provide a complete example, let us limit the comparison to just the Titles of two products.
Title I have: Motorola Talkabout MH230R Portable - two-way radio - FRS/GMRS 22-channel - yellow ( pack of 3 )
Amazon’s Title: Motorola MH230TPR Giant Rechargeable Two Way Radio 3 Pack, FRS/GMRS
These represent the same products. Is there any way determine if these are similar/same? A simple text comparison would not do.
It would be great if there are tools out there to handle this problem. If not I’d appreciate the algorithm or some pointers which I could use to research this area further.
I know C# and Java. I have used a bit of AI/Neural Networks in relation to numerical analysis – particularly Back Propagation and Genetic Algorithm – in comparing images and finding optimal points. I however have no clue how to handle text data.
Please let me know if this question is unclear, and I would try to clarify my description. Thank you all.