7

I'm having issues preparing an rmarkdown document in RStudio.
I'm importing a German data set that includes the umlaut "ü". When reading the table into RStudio I have to include the umlaut in a string.

The document is produced without any issues aside from the fact that after the ü, the text becomes the inverse of the color it should be. I created a MWE that reproduces the problem.
In the MWE the first chunk renders as I expect, however in the second chunk, after the word 'lücky' the remaining string elements are black.

Is there a way to avoid this?

MWE output

---
output: pdf_document
---

## MWE
When I use a normal 'u' in lucky everything looks fine
```{r }
a <- c('dog', 'cat', 'rabbit', 'lucky', 'pig', 'sheep', 'goat')
```

When I use a German 'ü' in lucky, the green text is the inverse of as it should be
```{r }
a <- c('dog', 'cat', 'rabbit', 'lücky', 'pig', 'sheep', 'goat')
```

Update with sessionInfo() and options('encoding') :

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.5.1  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2 htmltools_0.3.6
 [6] tools_3.5.1     yaml_2.2.0      Rcpp_0.12.18    stringi_1.2.4   rmarkdown_1.10 
[11] knitr_1.20      stringr_1.3.1   digest_0.6.16   evaluate_0.11

> options('encoding')
$`encoding`
[1] "native.enc"
zx8754
  • 52,746
  • 12
  • 114
  • 209
Socadillo
  • 71
  • 3
  • 2
    I'm having a hard time reproducing this because (I believe) I don't have the same system encoding you have. I don't know precisely what is different, but can you share the output from `sessionInfo()` and `options("encoding")`? – r2evans Nov 05 '18 at 21:27
  • 1
    I have tried using a few different encoding options, but it changed the umlaut to gibberish, which caused errors – Socadillo Nov 05 '18 at 21:58
  • Ok, I confirmed one thing ... my copy/paste mangled the umlaut despite my intentions. So now I can reproduce it ... – r2evans Nov 05 '18 at 22:00
  • 1
    What happens with `ä` and `ö`? – jay.sf Nov 05 '18 at 22:08
  • Oddly enough, `ä` and `ö` work as they should. Only `ü` is causing the issue. – Socadillo Nov 05 '18 at 23:18
  • Interesting question. I can reproduce this with the "knit" button in RStudio. Cannot reproduce with `rmarkdown:render()`, but `render` turns the "ü" into gibberish (file encoding: UTF-8, ISO8859-1 and Windows-1252 make no difference). – CL. Nov 06 '18 at 07:18
  • The following link is a little old but might add a clue, as you are on windows [Unicode with knitr and Rmarkdown](https://stackoverflow.com/questions/44153072/unicode-with-knitr-and-rmarkdown). – steveb Nov 06 '18 at 07:24
  • I've tried several encoding methods. When rendering a PDF the text turns green, although it is not an issue when making HTML. The strange thing is it only occurs with `ü` – Socadillo Nov 07 '18 at 19:21

1 Answers1

0

I used pdflatex as a LaTeX engine to reproduce this strange effect. Additionally i've marked as TRUE the option Keep tex source used to produce PDF. Strange effect eas reproducable and inside the the text source I found the reason:

When I use a normal `u' in lucky everything looks fine

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{a <-}\StringTok{ }\KeywordTok{c}\NormalTok{(}\StringTok{'dog'}\NormalTok{, }\StringTok{'cat'}\NormalTok{, }\StringTok{'rabbit'}\NormalTok{, }\StringTok{'lucky'}\NormalTok{, }\StringTok{'pig'}\NormalTok{, }\StringTok{'sheep'}\NormalTok{, }\StringTok{'goat'}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

When I use a German `ü' in lucky, the green text is the inverse of as it
should be

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{a <-}\StringTok{ }\KeywordTok{c}\NormalTok{(}\StringTok{'düg', '}\NormalTok{cat}\StringTok{', '}\NormalTok{rabbit}\StringTok{', '}\NormalTok{löcky}\StringTok{', '}\NormalTok{pög}\StringTok{', '}\NormalTok{sheep}\StringTok{', '}\NormalTok{goat}\StringTok{')}
\end{Highlighting}
\end{Shaded}

The ü occurs tag changing from expected StringTok into NormalTok to all following strings. That's why the format changed.

So from my point of view it's related to the rendering engine.

squeezer44
  • 560
  • 2
  • 17