1

I am in the process of learning MySQL and querying, and right now working with PHP to begin with. For learning purposes I chose a small anagram solver kind of project to begin with. I found a very old English language word list on the internet freely available to use as the DB. I tried querying, find in set and full-text search matching but failed.

How can I:

Match a result letter by letter?

For example, let's say that I have the letters S-L-A-O-G to match against the database entry.

Since I have a large database which surely contains many words, I want to have in return of the query:

lag
goal
goals
slag
log
... and so on.

Without having any other results which might have a letter used twice.

How would I solve this with SQL?

Thank you very much for your time.

Phil
  • 13,875
  • 21
  • 81
  • 126

3 Answers3

1
$str_search = 'SLAOG';

SELECT word
FROM table_name
WHERE word REGEXP '^[{$str_search}]+$' # '^[SLAOG]+$'

// Filter the results in php afterwards

// Loop START

$arr = array();
for($i = 0; $i < strlen($row->word); $i++) {

    $h = substr($str_search, $i, 0);
    preg_match_all("/{$h}/", $row->word, $arr_matches);
    preg_match_all("/{$h}/", $str_search, $arr_matches2);

    if (count($arr_matches[0]) > count($arr_matches2[0]))
        FALSE; // Amount doesn't add up

}

// Loop END

Basicly run a REGEXP on given words and filter result based on how many occurencies the word compared with the search word.

The REGEXP checks all columns, from beginning to end, with a combination of given words. This may result in more rows then you need, but it will give a nice filter nonetheless.

The loop part is to filter words where a letter is used more times then in the search string. I run a preg_match_all() on each letter in found the word and the search word to check the amount of occurencies, and compare them with count().

Robin Castlin
  • 10,956
  • 1
  • 28
  • 44
  • Thank you @robin. I did not know at all that regexp could be performed as an SQL query. Could you please kindly explain though a little bit the regex in the query and in the php code? Thank you very much. – Phil May 11 '12 at 13:26
1

If you want a quick and dirty solution....

Split the word you're trying to get anagrams for into individual letters. Assign each letter an individual prime number value, and multiply them all together; eg:

C - 2
A - 3
T - 5

For a total of 30

Then step through your dictionary list, and do the same operation on each word in that. If your target word's value is divisible exactly by the dictionary word's value, then you know that the dictionary word has only letters that occur in your target word.

You can speed it up by pre-calculating the dictionary values, and then querying for just the right values: SELECT * FROM dictionary WHERE ($searchWordTotal % wordTotal) = 0 (searchWordTotal is the total for the word you're looking for, and wordTotal is the one from the database)

I should get around to writing this properly one of these days....

andrewsi
  • 10,807
  • 132
  • 35
  • 51
  • That's awesomely clever. :) But why test the remainder of a division, why not just `WHERE $searchWordTotal = wordTotal`? – ZeroOne May 11 '12 at 14:54
  • 1
    If you want to find proper anagrams that re-use all the letters, then yes, you'd look for ones that match; but the poster also wanted to return words made up of some of the letters, in which case you need to do the modulus check. All I need to do now is figure out how to easily find anagrams made up of multiple words, without a lot of nasty coding.... – andrewsi May 11 '12 at 14:58
  • Actually... By the time this routine's finished, I'll have a list of words that are partial anagrams of the main word. So if a word isn't a full match, I can figure out what letters I still need to find, and search the list I've just generated for words that have those letters. And the great part is, I can even do that recursively, so it'll pick up multiple words. So searching for TIGER will find ERG; and I can then search my list for words that match TI and find IT. Why do I get the feeling I'm going to be spending the weekend coding this up? – andrewsi May 11 '12 at 15:58
  • Seems like a viable way of spending a weekend. ;) – ZeroOne May 11 '12 at 16:57
  • I've got an initial version coded up in perl; I've spent the last couple of days trying to figure out how to optimise it. – andrewsi May 15 '12 at 23:42
0

since you only want words with the letters given, and no others, but you dont need to use all the letters, then i suggest logic like this:

* take your candidate word,
* do a string replace of the first occurrence of each letter in your match set,
* set the new value to null
* then finally wrap all that in a strlength to see if there are any characters left.

you can do all that in sql - but a little procedure will probably look more familiar to most coders.

Randy
  • 16,480
  • 1
  • 37
  • 55