5

I'm aware of the algorithms to search words with similar sounds like the one supported by the Fuzzy library.

But how to search by reverse? That is to say that given a symbol like /θ/, search a document of all possible word matches like earth, thigh, throw, bath.

Marconi
  • 3,601
  • 4
  • 46
  • 72
  • Please elaborate a bit more on "searching by reverse"? I understand the first one's you are talking about are metaphone/soundex Algorithms. – Yavar Jan 07 '14 at 08:47
  • 2
    It is not a simple task. You would need a way to transform the orthographic form (i.e., normal spelling) into a phonetic form. [CMUdict](http://www.speech.cs.cmu.edu/cgi-bin/cmudict) and [CELEX](http://celex.mpi.nl/) are two pronouncing dictionaries that you could use to get started. NLTK has [some related facilities](http://nltk.org/_modules/nltk/corpus/reader/cmudict.html). – BrenBarn Jan 07 '14 at 08:49
  • 1
    @Yavar reverse might be a poor choice of word there but basically I want to get all words matching a phonetic symbol. – Marconi Jan 07 '14 at 08:56
  • @BrenBarn thanks I'll search more about it. – Marconi Jan 07 '14 at 09:00
  • Welcome to Stack Overflow! It looks like you want us to write some code for you. While many users are willing to produce code for a coder in distress, they usually only help when the poster has already tried to solve the problem on their own. A good way to demonstrate this effort is to include the code you've written so far, example input (if there is any), the expected output, and the output you actually get (console output, stack traces, compiler errors - whatever is applicable). The more detail you provide, the more answers you are likely to receive. Check the [FAQ] and [ask] – Inbar Rose Jan 07 '14 at 09:01
  • 1
    @InbarRose I know how SO works, the thing is I don't know where to start so I'm asking for some pointers. If you look at my history then you'll see that I provide code first if I actually know where to start. :) – Marconi Jan 07 '14 at 09:33
  • 1
    @Yavar I think I found a solution, probably not the best but it might work. Basically using espeak to convert words to IPA so know I have one to one matches. Then I can index the words using phonetic alphabet. – Marconi Jan 07 '14 at 09:36

0 Answers0