0

Thanks in advance for the help. I am trying to parametrically knit an RMD file into PDFs and HTMLs of different languages; one of them is Armenian. Knitting the Armenian version to any kind of format always produces literal UTF entities, like <U+0553><U+0578><U+0580><U+0571><U+0561><U+057C><U+0578>.
The .tex files are rendered as follows: \textless U+0553\textgreater\textless U+0578\textgreater\textless U+0580\textgreater\textless U+0571\textgreater

I have also tried using XeLaTeX and setting the main font to DejaVu Sans (i.e. latex_engine: xelatex and mainfont: DejaVu Sans), but it still knits the Armenian characters as Unicode literals. I edited the XeTex .tex output and put in Armenian characters, and it worked fine. If Pandoc/RMD/Knitr didn't convert the characters to literals, then everything would work pefectly fine.

Even the .knit.md file already has the Unicode literals, so the problem is high up in the knitting chain, and I am not sure why this might be.

Any help would be appreciated.


Here's my minimal document (LuaLaTeX):

    ---
    title: "Test Doc"
    author: "Document"
    output:
      html_fragment:
        section_divs: no
      pdf_document:
        latex_engine: lualatex
    geometry: margin = 2cm
    params:
      lang: yes
    lang: "`r switch(params$lang, CY = 'cy-GB', EN = 'en-GB', HY = 'hy-AM')`"
    knit: (function(inputFile, encoding){ files <- c("cv-cy", "cv-en", "cv-hy"); langs <- c(list(lang = "CY"), list(lang = "EN"), list(lang = "HY")); i <- 1; for(f in files){ rmarkdown::render(inputFile, output_format = c("html_fragment", "pdf_document"), encoding = encoding, params = langs[i], output_file = c(paste0(dirname(inputFile), "/", print(f)), paste0(dirname(inputFile), "/", print(f)))); i <- i +1 } })
    ---

    ```
    {r setup, include=FALSE}
    knitr::opts_chunk$set(echo = FALSE)
    lswitch <- function(lang, ...){
      switch(lang, ..., stop(""))
    }
    ```

    `r lswitch(params$lang,
    CY = "# Enw",
    EN = "# Name",
    HY = "# Անում"
    )
    `

Here's the pertinent section in the .log for the LuaLaTeX build:

    ) (d:/texlive/texmf-dist/tex/generic/babel/babel.sty
    Package: babel 2019/05/04 3.31 The Babel package
    (d:/texlive/texmf-dist/tex/generic/babel/switch.def
    File: switch.def 2019/05/04 3.31 Babel switching mechanism
    ) (d:/texlive/texmf-dist/tex/generic/babel/luababel.def
    \l@dumylang=\language2
    Package babel Info: Non-standard hyphenation setup on input line 114.
    \l@nohyphenation=\language3
    \l@german-x-2019-04-04=\language4
    \l@ngerman-x-2019-04-04=\language5
    \l@afrikaans=\language6
    \l@ancientgreek=\language7
    \l@ibycus=\language8
    \l@arabic=\language9
    \l@armenian=\language10
    \l@basque=\language11
    \l@belarusian=\language12
    \l@bulgarian=\language13
    \l@catalan=\language14
    \l@pinyin=\language15
    \l@churchslavonic=\language16
    \l@coptic=\language17
    \l@croatian=\language18
    \l@czech=\language19
    \l@danish=\language20
    \l@dutch=\language21
    \l@ukenglish=\language22
    \l@usenglishmax=\language23
    \l@esperanto=\language24
    \l@estonian=\language25
    \l@ethiopic=\language26
    \l@farsi=\language27
    \l@finnish=\language28
    \l@french=\language29
    \l@friulan=\language30
    \l@galician=\language31
    \l@georgian=\language32
    \l@german=\language33
    \l@ngerman=\language34
    \l@swissgerman=\language35
    \l@monogreek=\language36
    \l@greek=\language37
    \l@hungarian=\language38
    \l@icelandic=\language39
    \l@assamese=\language40
    \l@bengali=\language41
    \l@gujarati=\language42
    \l@hindi=\language43
    \l@kannada=\language44
    \l@malayalam=\language45
    \l@marathi=\language46
    \l@oriya=\language47
    \l@panjabi=\language48
    \l@pali=\language49
    \l@tamil=\language50
    \l@telugu=\language51
    \l@indonesian=\language52
    \l@interlingua=\language53
    \l@irish=\language54
    \l@italian=\language55
    \l@kurmanji=\language56
    \l@latin=\language57
    \l@classiclatin=\language58
    \l@liturgicallatin=\language59
    \l@latvian=\language60
    \l@lithuanian=\language61
    \l@mongolian=\language62
    \l@mongolianlmc=\language63
    \l@bokmal=\language64
    \l@nynorsk=\language65
    \l@occitan=\language66
    \l@piedmontese=\language67
    \l@polish=\language68
    \l@portuguese=\language69
    \l@romanian=\language70
    \l@romansh=\language71
    \l@russian=\language72
    \l@sanskrit=\language73
    \l@serbian=\language74
    \l@serbianc=\language75
    \l@slovak=\language76
    \l@slovenian=\language77
    \l@spanish=\language78
    \l@swedish=\language79
    \l@thai=\language80
    \l@turkish=\language81
    \l@turkmen=\language82
    \l@ukrainian=\language83
    \l@uppersorbian=\language84
    \l@welsh=\language85
    )

    ! Package babel Error: Unknown option `armenian'. Either you misspelled it
    (babel)                or the language definition file armenian.ldf was not fou
    nd.

    See the babel package documentation for explanation.
    Type  H <return>  for immediate help.
     ...                                              

    l.533   \ExecuteOptions{\bbl@opt@main}

Session info:

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363), RStudio 1.2.5019

Locale:
  LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
  LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
  LC_TIME=English_United Kingdom.1252    

Package version:
  base64enc_0.1.3 digest_0.6.23   evaluate_0.14   glue_1.3.1      graphics_3.6.1  grDevices_3.6.1 highr_0.8      
  htmltools_0.4.0 jsonlite_1.6    knitr_1.26      magrittr_1.5    markdown_1.1    methods_3.6.1   mime_0.7       
  Rcpp_1.0.3      rlang_0.4.2     rmarkdown_1.18  stats_3.6.1     stringi_1.4.3   stringr_1.4.0   tinytex_0.17   
  tools_3.6.1     utils_3.6.1     xfun_0.11       yaml_2.2.0     

Pandoc version: 2.7.2
  • Please also provide your `xfun::session_info('rmarkdown')`. – Yihui Xie Dec 14 '19 at 15:27
  • Done! Added at the bottom of the post. – jorellanaf Dec 14 '19 at 18:22
  • Just want to point out someting I hadn't noticed is that the HTML output is also only unicode codes like ``, so it's not just a LaTeX issue! – jorellanaf Dec 19 '19 at 10:50
  • Rendering a .md file also prints out the Unicode literals. Even the .knit.md file generated already has the Armenian text as Unicode literals. It is simply not possible to knit any Armenian text in any format because somewhere along the way knitr/RMD/Pandoc force Armenian characters to unicode literals to generate the .md file. The .knit.md file looks like ```### ``` – jorellanaf Apr 15 '20 at 08:21
  • Likely duplicate of [this question](https://stackoverflow.com/q/44153072). – jorellanaf Jul 27 '20 at 19:08

0 Answers0