return string most like another string in PHP

Question

I'm trying to build a spell checking system in PHP.

I already have a lexicon / corpus of many of the words in the English language. Each word on a new line.

What I'm trying to do is once given a string, let's call it $string1, open the text file and search it for the string most like $string1 and return that string.

For the searching the text file, my idea is to import the contents into an array and then use in_array(). If there is a better way to search the text file without importing all the words into the memory, please let me know.

I don't know how to compare the two strings for similarities. How would I do that?

In my mind, all the same letters but in a different order would rank higher than different letters. But what would rank higher than both would be a partial match +/- a character or two.

I would greatly appreciate any help with opening the text file and searching it and comparing the strings.

Why not use a database? Then you can index your words for a much faster search — Mark Baker, Jun 01 '13 at 15:11
@MarkBaker This project is part of a #nosql movement so I can't. What are those functions? I haven't heard of them. — irfan mir, Jun 01 '13 at 16:25
They're basically algorithms for measuring the difference between words: typically you use them with a threshold of commonality (ie. 90% similarity) — Mark Baker, Jun 01 '13 at 16:31
Seems to me that this is one instance where #nosql is being taken to the extremes of refusing to use the best tool for the job — Mark Baker, Jun 01 '13 at 16:31
@MarkBaker haha maybe the #nosql movement has gone too far. But, I am interested in these functions. Could you post an answer with more on them and how I can use them? — irfan mir, Jun 01 '13 at 21:06

score 0 · Answer 1 · answered Jun 01 '13 at 21:59

0

References for the three algorithms that I mentioned in the comments:

answered Jun 01 '13 at 21:59

Mark Baker

209,507
32
346
385

return string most like another string in PHP

1 Answers1