5

Say you want to take CMU's phonetic data set input that looks like this:

ABERRATION  AE2 B ER0 EY1 SH AH0 N
ABERRATIONAL  AE2 B ER0 EY1 SH AH0 N AH0 L
ABERRATIONS  AE2 B ER0 EY1 SH AH0 N Z
ABERT  AE1 B ER0 T
ABET  AH0 B EH1 T
ABETTED  AH0 B EH1 T IH0 D
ABETTING  AH0 B EH1 T IH0 NG
ABEX  EY1 B EH0 K S
ABEYANCE  AH0 B EY1 AH0 N S

(The word is to the left, to the right are a series of phonemes, key here)

And you want to use it as training data for a machine learning system that would take new words and guess how they would be pronounced in English.

It's not so obvious to me at least because there isn't a fixed token size of letters which could possible map to a phoneme. I have a feeling that something to do with a markov chain might be the right way to go.

How would you do this?

ʞɔıu
  • 47,148
  • 35
  • 106
  • 149
  • One thing to keep in mind is that both the CMU and moby data are for American pronunciation and don't have a very good set of phonemes for British or other English varieties. In fact even the CMU and moby data have different sets of phonemes. The moby pronunciator is here: http://icon.shef.ac.uk/Moby/mpron.html – hippietrail May 09 '11 at 04:38

2 Answers2

6

The problem is called Grapheme-to-phoneme conversion, a subproblem of Natural Language Processing. Google brings up a few papers.

Frank
  • 64,140
  • 93
  • 237
  • 324
2

Not entirely my field, but maybe build a neural network with several layers - earlier layers to guess the splitting of the words into sequential syllables, the later layers to guess the pronounciation of the said syllables.

Setting up a ANFIS-learning neural network is fairly straightforward for numerical data, for literal/phonetic data the task is undoubtedly several orders more complex.

Jukka Dahlbom
  • 1,740
  • 2
  • 16
  • 23
  • can you really have an NN with a variable number of output nodes? – ʞɔıu Mar 24 '09 at 01:50
  • I believe so - quick googling suggests it is easier to train the networks separately and then combine to achieve several outputs. This problem is far from trivial, and I don't claim to being able to really solve it. – Jukka Dahlbom Mar 24 '09 at 08:22
  • Would you really need a variable number of output nodes? Unless the number of phonemes is prohibitively large, just have as many output nodes as possible phonemes. – bubaker May 20 '09 at 01:53