-1

I'm trying to build a spell checking system in PHP.

I already have a lexicon / corpus of many of the words in the English language. Each word on a new line.

What I'm trying to do is once given a string, let's call it $string1, open the text file and search it for the string most like $string1 and return that string.

For the searching the text file, my idea is to import the contents into an array and then use in_array(). If there is a better way to search the text file without importing all the words into the memory, please let me know.

I don't know how to compare the two strings for similarities. How would I do that?

In my mind, all the same letters but in a different order would rank higher than different letters. But what would rank higher than both would be a partial match +/- a character or two.

I would greatly appreciate any help with opening the text file and searching it and comparing the strings.

irfan mir
  • 373
  • 2
  • 4
  • 10
  • Why not use a database? Then you can index your words for a much faster search – Mark Baker Jun 01 '13 at 15:11
  • As for similarity: metaphone(), levenshtein(), soundex()? – Mark Baker Jun 01 '13 at 15:12
  • @MarkBaker This project is part of a #nosql movement so I can't. What are those functions? I haven't heard of them. – irfan mir Jun 01 '13 at 16:25
  • They're basically algorithms for measuring the difference between words: typically you use them with a threshold of commonality (ie. 90% similarity) – Mark Baker Jun 01 '13 at 16:31
  • Seems to me that this is one instance where #nosql is being taken to the extremes of refusing to use the best tool for the job – Mark Baker Jun 01 '13 at 16:31
  • @MarkBaker haha maybe the #nosql movement has gone too far. But, I am interested in these functions. Could you post an answer with more on them and how I can use them? – irfan mir Jun 01 '13 at 21:06

1 Answers1

0

References for the three algorithms that I mentioned in the comments:

Mark Baker
  • 209,507
  • 32
  • 346
  • 385