1

Is there a way to divide an Ancient Greek text (UTF-8) into syllables in R? I need to count the number of unique syllables in a corpus.

I cannot find an algorithm to do so, and the rules are quite complicated to write it from scratch.

1 Answers1

3

Basing on https://cran.r-project.org/web/packages/sylly/vignettes/sylly_vignette.html#fn2, here is a solution

library(sylly.en)
sample.text <- "Μουσάων Ἑλικωνιάδων ἀρχώμεθ' ἀείδειν"


url.grc.pattern <- url("http://tug.ctan.org/tex-archive/language/hyph-utf8/tex/generic/hyph-utf8/patterns/txt/hyph-grc.pat.txt")
hyph.grc <- read.hyph.pat(url.grc.pattern, lang="grc")
close(url.grc.pattern)

hyph.txt.grc <- hyphen(sample.text, hyph.pattern=hyph.grc) # or
hyph.txt.grc <- hyphen_df(sample.text, hyph.pattern=hyph.grc)
class(hyph.txt.grc$word) # character vector

Some words are not hyphenated correctly, though.