I have a use case where I need to do complicated string matching on records of which there are about 5.1 Million of. When I say complicated string matching, I mean using library to do fuzzy string matching. (http://blog.bripkens.de/fuzzy.js/demo/)
The database we use at work is SAP Hana which is excellent for retrieving and querying because it's in memory so I would like to avoid pulling data out of there and re-populating it in memory on the application layer but at the same time I cannot take advantages of the libraries (there is an API for fuzzy matching in the DB but it's not comprehensive enough for us).
What is the middle ground here? If I do pre-processing and associate words in the DB with certain keywords the user might search for I can cut down the overhead but are there any best practises that are employed when It comes to this ?
If it matters. The list is a list of Billing Descriptors (that show up on CC statements) therefore, the user will search these descriptors to find out which companies the descriptor belongs too.