5

I have a MySQL table with the following columns:

City      Country  Continent
New York  States   Noth America
New York  Germany  Europe - considering there's one ;)
Paris     France   Europe

If I want to find "New Yokr" with a typo, it's easy with a MySQL stored function:

$querylev = "select City, Country, Continent FROM table 
            WHERE LEVENSHTEIN(`City`,'New Yokr') < 3"

But in case there are two New York cities, searching with fulltext you can put "New York States" and you get your desired result.

So the question is, could I search "New Yokr Statse" and get the same results?

Is there any function merging levenshtein and fulltext to make an all in one solution or should I create a new column in MySQL concatenating the 3 columns?

I know there are other solutions such as lucene or Sphinx (also soundex, metaphone, but not valid for this) but I think for me could be kind of hard to implement them.

Emma
  • 27,428
  • 11
  • 44
  • 69
TrOnNe
  • 1,632
  • 16
  • 30
  • First, have you tried it yourself? I don't think you could get both from that since New Yokr Statse has a distance of 4 from New York States. – Kai Qing Feb 17 '13 at 18:28
  • what do you mean about trying it myself? I'm trying different ways, but far from reaching a solution :( for example splitting each word into tokens and calling the levenshtein distance, but for that I have to split every word, and doens't seems to be a good solution – TrOnNe Feb 18 '13 at 19:35
  • I mean that you asked "could I search "New Yokr Statse" and get the same results?" - a simple test would tell you no. but then you could also mean "how do I modify this to accept this other case as well" - hard to say from the post. I myself do not have an answer via mysql alone under this construct. Aside massive data collecting and user behavior logging, I don't know how to reliably store and reference mistypes. Like a "did you mean" feature, which is what this looks like. – Kai Qing Feb 18 '13 at 19:39
  • 1
    What about 'SELECT CONCAT(city, ' ', country, ' ', continent) full FROM table UNION SELECT CONCAT(city, ' ', country) full FROM table UNION 'SELECT City full FROM table WHERE LEVENSHTEIN(Full, _search Term_) < 3' ? – collapsar Feb 19 '13 at 08:49
  • hmmm, let's try.. I'll let you know, thanks! Still looking for a way :) – TrOnNe Feb 24 '13 at 12:30
  • This may help you: https://gordonlesti.com/fuzzy-fulltext-search-with-mysql/ – Phellipe Ribeiro Nov 10 '15 at 17:46

1 Answers1

0

Great question, and a good example of how we can use char lists and regular expression boundaries to design queries and retrieve the data that we wish.

Depending on the accuracy we might want and the data we are having in the database, we can surely design custom queries based on a variety of expressions, such as this example for New York State with various types:

([new]+\s+[york]+\s+[stae]+)

Here, we have three char lists that we can update with other possible letters.

[new]
[york]
[stae]

We have also added two sets of \s+ as our boundaries here to increase the accuracy.

DEMO

This snippet just shows that how the capturing groups work:

const regex = /([new]+\s+[york]+\s+[stae]+)/gmi;
const str = `Anything we wish to have before followed by a New York Statse then anything we wish to have after. Anything we wish to have before followed by a New  Yokr  State then anything we wish to have after. Anything we wish to have before followed by a New Yokr Stats then anything we wish to have after. Anything we wish to have before followed by a New York Statse then anything we wish to have after. `;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

PHP

$re = '/([new]+\s+[york]+\s+[stae]+)/mi';
$str = 'Anything we wish to have before followed by a New York Statse then anything we wish to have after. Anything we wish to have before followed by a New  Yokr  State then anything we wish to have after. Anything we wish to have before followed by a New Yokr Stats then anything we wish to have after. Anything we wish to have before followed by a New York Statse then anything we wish to have after. ';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches);
Community
  • 1
  • 1
Emma
  • 27,428
  • 11
  • 44
  • 69