2

I am using R in a windows environment. When i use sink to direct the output to a file, i can't set encoding to UTF-8.

sink("Umlaute.tex", append=FALSE, split=TRUE)
cat("ÄÖÜäöüß")
sink()

How can I set output encoding to UTF-8?

  • Are you using RStudio? If you are it is worth seeing if you can update your version; a recent version improved support for UTF-8 so it might just be what's causing the problem. – Phil Aug 15 '16 at 13:20
  • I am using RStudio 0.99.903 and R3.3.1. RStudio's file encoding is set to UTF-8, the same with all my Latex files. That's why I would like to use R's output via sink also in UTF-8 format. – Michael Schmitz Aug 15 '16 at 16:21

1 Answers1

5

You can open a connection with the correct encoding first and then sink to that connection. That also allows more control how the file is opened.

con <- file("Umlaute.tex", open = "wt", encoding = "UTF-8")
sink(con, split = T)
cat("ÄÖÜäöüß")
sink()
close(con)
AEF
  • 5,408
  • 1
  • 16
  • 30
  • 1
    This only works for strings representable in the current locale and won't be fixed: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17503 – AlexR Nov 22 '18 at 21:44
  • This could do the trick but will print everything, the split = T becomes default and split = F will not work. – GuillaumeL Dec 02 '20 at 14:06
  • Does anyone know where I can find a list of strings representable in my current locale? I'm surprised by some of the things not covered. In addition, I'm surprised this text with diacritical marks is covered in the US set. – pdb Dec 23 '20 at 23:47
  • 1
    @pdb On Windows, one way is to run `l10n_info()` in R. This will either tell you that you are using UTF-8, or give you a windows codepage number, whose character set you can easily lookup on the internet. Not sure how it behaves on Unix though. Sidenote: The R docs mention that the standard codepage for R is 1252 (Western European) - so that may explain why you could represent the diacritical marks you mentioned. – AEF Dec 24 '20 at 08:19