Okay, I've seen lots of posts about fuzzy string matching, Levenstein distance, longest common substring, and so on. None of them seem to fit exactly for what I want to do. I'm pulling product results from a variety of web services, and from those I can build a big list of names for the product. These names may include a bunch of variable junk. Here are some examples, from SearchUPC:
Apple 60W magsafe adapter L-shape with extension cord
Original Apple 60W Power Adapter (L-shaped Connector) for MacBook MC461LL/A with AC Extension Wall Cord (Bulk Packaging)
Current Apple MagSafe 60W Power Adapter for MacBook MC461LL/A with AC Extension Wall Cord (Bulk Packaging)
Apple 60W MagSafe Power Adapter - Apple Mac Accessories
Apple - MagSafe 60W Power Adapter for MacBook and 13\" MacBook Pro
MagSafe - power adapter - 60 Watt
etc. What I'd like to do is pull the common product name (which to my heuristic human eye is obviously Apple 60W MagSafe Power Adapter), but none of the aforementioned methods seem likely to work. My main problem is that I don't know what to search the list of strings for... At first, I was thinking of trying longest common substring, but it seems like that will fail as a bunch of the strings have things out of order, which might yield a product name of power adapter, which is not terribly useful to the user.
Note: the vast majority of the records returned from the SearchUPC API (mostly omitted here) do include the literal string "Apply 60W MagSafe Power Adapter
".
I'm implementing this in Objective-C, for iOS, but I'm really interested in the algorithm more than the implementation, so any language is acceptable.