I'm trying to get the syllables of word using pyhyphen. If I'm using the English dictionary, the apostrophe is handled correctly in my opinion
import hyphen
h = hyphen.Hyphenator('en_US')
h.syllables(u"Hammond's")
It's just included in one syllable
[u'Ham', u"mond's"]
But if I do the same using the German dictionary
h = hyphen.Hyphenator('de_CH')
h.syllables(u"Hammond's")
h.syllables(u"Bismarck'sche")
the apostrophe is seen as if it was it's own syllable:
[u'Ham', u'mond', u"'s"]
[u'Bis', u'marck', u"'", u'sche']
I was wondering how if it was possible to define exceptions (not to break) for certain characters? Like it is possible in LaTex.
The workaround that came to my mind was just to look for a leading apostrophe in the syllables and just concatenate with the previous one:
syl = [u'Bis', u'marck', u"'", u'sche']
syls2 = []
for syl in syls:
if syl.startswith("'"):
if not syls2:
syls2.append(syl)
else:
syls2[-1]+=syl
else:
syls2.append(syl)
[u'Bis', u"marck'", u'sche']
But this is not a nice or general solution and I'm interested in general, how to define hyphenation rules for words, where it is done incorrectly.