2

We are making a project that need to speak Arabic and we've found an open source tool, the Mbrola project, that can do this.

However, I also need some way to convert the Arabic text to the SAMPA phonetics. So could any one help me to convert the Arabic text to the SAMPA phonetics?

remio
  • 1,242
  • 2
  • 15
  • 36
JimmyKomy
  • 31
  • 4

1 Answers1

0

This is much easier to do in Arabic than for example English. That is because Arabic is basically phonetic. There are some very small exceptions which have highly regular rules so they should be easy to handle. For example:

  1. Harf Shamsia ... Al + (n, t, d, dh, l, s, sh) the "l" assimilates to the first letter of the word that follows.
  2. Harf Qamaria ... Al + (anything else), the "l" sound remains.

Here is a simple look up table. (Presumably, you read Arabic, or have someone to help you with that?). You will need to be careful to take into account the text encoding type of your Arabic source, is it Unicode (and what type of unicode encoding), or code page 1256?

That should be enough to get you started. Good success!

Ahmed Ashour
  • 5,179
  • 10
  • 35
  • 56
happy coder
  • 1,517
  • 1
  • 14
  • 29
  • That table doesn't let you start with an Arabic letter, look it up, and find a corresponding SAMPA phoneme. What did you mean by calling it a lookup table? – Camille Goudeseune May 02 '18 at 21:31
  • Look at the Arabic character you want to represent with the equivalent SAMPA character, then go the the left hand side of the chart. That is the SAMPA character that represents the Arabic character you are interested in. So, the left hand side of the chart is what you are looking for to replace the Arabic characters. – happy coder May 04 '18 at 17:50
  • OK, I looked for ب . I found 9 instances in the table. By reading the words containing that letter, I eventually guessed that it maps to `b`. But that's not a lookup table, that's as complicated as the Rosetta Stone. – Camille Goudeseune May 04 '18 at 18:09
  • 1
    Ok, I just assumed that people who were using the table already knew how to read Arabic. So, the first character on the right-hand side Arabic word, i.e. the right most character is the character which is represented by the SAMPA character on the left hand side of the column. I suppose technically speaking what you were perhaps looking for was a one to one correspondence. That is not what this is, it shows you how SAMPA is practically used in orthography. One other thing realizing that some of you don't read Arabic, is that the letters have different shapes depending on word position. – happy coder May 08 '18 at 16:30
  • Further to this comment ... have a look at: [alphabet chart](http://web.stanford.edu/dept/lc/arabic/alphabet/incontextletters.html) – happy coder May 08 '18 at 16:33