0

I am working on online review data, that is full of "Internet lingo". I want to do some lexicon analysis on the words. Long story short I want a spell checker that can take into account the language used in internet. After some research I, I found out 2 approaches:

  1. Text Brew which is a modified version of edit distance.
  2. Metaphone, which uses sound based approach.

PS. I will be parsing the data to clean some of the net lingo like "lol", "lmao" etc. My only concern is wrongly spelled words and I am working on Java.

  • Which approach should I take in my case? – rahul.tejwani Mar 08 '14 at 00:56
  • Metaphone would work better for names, standard words, and acronyms (pronounceable initialisms), but would probably be near useless with initialisms like "lmao" that get spelled out. Particularly since it discards vowels. "lmao" looks just like "elm" or "lamb" to it. – cHao Mar 08 '14 at 00:58
  • Thanks for the quick response. Are you aware of any open source library that I can try for metaphone? I will be cleaning out initailisms like "lmao", before using the spellchecker – rahul.tejwani Mar 08 '14 at 01:04
  • For Java, i dunno. I've only ever used it in PHP, which has it built in. The algorithm is on Wikipedia, though; you could easily whip up your own implementation if you can't find one. – cHao Mar 08 '14 at 01:05
  • Thanks again. I found out that Apache Commons support Metaphones and DoubleMetaphones. :) Will try out and update this thread with the results. – rahul.tejwani Mar 08 '14 at 01:29

0 Answers0