I am exploring how to use CFStringTransform
to transliterate texts in Hebrew and I am stuck with a few inconsistencies in which letters that should be pronounced differently are written in the exact same way or special cases that are not taken into account by Apple's algorithm.
Kaf (כּ → K) vs Khaf (כ → Ḵ)
כִּי ("because")
let string = NSMutableString(string: "כִּי") CFStringTransform(string, nil, kCFStringTransformLatinHebrew, true) print(string) // prints "ki̇y"
שָׁכָחְתִּי ("I forgot")
let string = NSMutableString(string: "שָׁכָחְתִּי") CFStringTransform(string, nil, kCFStringTransformLatinHebrew, true) print(string) // prints "şá̌káẖĕţi̇y" instead of "şá̌ḵáẖĕţi̇y"
While the kaf in כִּי is pronounced like a K in English, the khaf in שָׁכָֽחְתִּי is pronounced as in loch or Bach and it's typically transliterated as CH, KH or Ḵ. However, both letters are transliterated as K.
Pei (פּ → P) vs Fei (פ → F)
פַּרְעֹה ("pharaoh")
let string = NSMutableString(string: "פַּרְעֹה") CFStringTransform(string, nil, kCFStringTransformLatinHebrew, true) print(string) // prints "pȧrĕʻòh"
יוֹסֵף ("Joseph")
let string = NSMutableString(string: "יוֹסֵף") CFStringTransform(string, nil, kCFStringTransformLatinHebrew, true) print(string) // prints "ywòsép" instead of "ywòséf"
While the pei in פַּרְעֹה is pronounced like a P would be pronounced in English (and transliterated accordingly), the (trailing) fei in יוֹסֵף is pronounced like an F (and transliterated accordingly). However, both are transliterated with a P.
Trailing consonants with pataḥ g'nuva
From the article on Hebrew vocalization in the English Wikipedia:
A patach on a letters ח, ע, ה at the end of a word is sounded before the letter, and not after. Thus, נֹחַ (Noah) is pronounced /ˈno.ax/. This only occurs at the ends of words and only with patach and ח, ע, and הּ (that is, ה with a dot (mappiq) in it). This is sometimes called a patach ganuv, or "stolen" patach (more formally, "furtive patach"), since the sound "steals" an imaginary epenthetic consonant to make the extra syllable.
However:
תַפּוּחַ ("apple")
let string = NSMutableString(string: "תַפּוּחַ") CFStringTransform(string, nil, kCFStringTransformLatinHebrew, true) print(string) // prints "ţaṗẇẖa" instead of "ţaṗẇaẖ"
Q: How can I change the behavior of CFStringTransform
to account for these three cases?
From the reference for CFMutableString
, we see that CFStringTransform
takes as the transform:
parameter
A CFString object that identifies the transformation to apply. For a list of valid values, see Transform Identifiers for CFStringTransform. On OS X v10.4 and later, you can also use any valid ICU transform ID defined in the ICU User Guide for Transforms.
From the documentation it would sound like the rules for ICU transforms are flexible enough that they can be customized. There is even a rule editor that can be accessed from their playground, but, while I have found a Stack Overflow question that deals with something tangentially similar, I cannot find a clearly documented way of doing it for RTL languages.