1

I have table with millions of rows. Now when User make any spelling mistake while searching string or word from table, I want to recommend user correct word or string from table. I am using jaro-winkler algorithm to compare distance of string, but as my database is huge my query is taking so much time in execution. Some how I will have to minimize data before applying jaro-winkler algorithm.

Is there any other better way through which I can get quick search recommendation from large table.

I want to build similar search like google search without auto suggestion. I am using my sql database.

Please suggest.

JP711
  • 93
  • 1
  • 10

1 Answers1

0

You probably want to try a SOUNDEX https://en.wikipedia.org/wiki/Soundex query. https://msdn.microsoft.com/en-us/library/ms187384.aspx

  • ALTER the table and add a column for soundex of the word
  • UPDATE the table to fill in the soundex for the word
  • Add an INDEX on the soundex column
  • SELECT * FROM Table WHERE WordSoundex = SOUNDEX(@query)
Louis Ricci
  • 20,804
  • 5
  • 48
  • 62
  • Hey thanks for your reply. your suggestion is working for some kind of scenarios. If my table has keywords "lisa" and "lake" then soundex of both keywords will be same so if I compare it with soundex of "leke" then I am getting both keywords as output and as I have so many similar kind of words I am not getting perfect match. Where Jaro Winkler is giving distance of "leke" and "lake" smaller then "leke" and "lisa". So I guess jaro winkler is more suitable option for me. Only concern is time required by Jaro - Winkler is to much and i need to shorten down that. – JP711 Oct 06 '15 at 08:50
  • @user5396801 - All you need to do is process the results of your soundex query with your JW distance function t get your final output. Sundex will cull the huge table down to a managable size, so that you can use the expensive algorithm on what's left. – Louis Ricci Oct 06 '15 at 11:16
  • that's a really great option. I will try and i guess 99% it will work fast. But I found another issue with two keywords. "watermark" and "waterman". "Watermark" is present in database but if user enter "waterman" ideally i should suggest "watermark" as , as per JW both are 97% closer. but soundex of "watermark" is W36562 and "waterman" is "W365" so it is not matching it. – JP711 Oct 06 '15 at 11:57
  • @user5396801 - that's curious behavior because the standard spec for SOUNDEX truncates the results to 4 characters so W36562 (6 characters) would be impossible. – Louis Ricci Oct 06 '15 at 13:30
  • I am getting 6 characters in mysql, is there any way we can reduce it to four.. unfortunately i am not able to send you screen shot ... Please check below link: https://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_soundex – JP711 Oct 06 '15 at 13:57
  • @JP711 - mySQL has the LEFT() function which can truncate to the first 4 characters. LEFT(SOUNDEX(@query), 4) – Louis Ricci Oct 06 '15 at 14:25
  • Thats great... so just last question. Changing string from 6 to 4 characters will not affect final results. right? – JP711 Oct 06 '15 at 15:19
  • @JP711 - of course it affects the results, doing it will allow waterman and watermark to match. – Louis Ricci Oct 06 '15 at 15:50