2

I was working on a toy project and tried using some unicode variable names to match a paper I was attempting to implement.

The following code works fine on R 3.4.3 on Windows (RStudio version 1.1.456) and R 3.5.1 on OSX:

> µ  <- function(ß,  n) ß  *  n 
> µ(2, 3)
[1] 6

This code gives the following error, with α typed as ALT+224:

> α <- 2
Error: unexpected input in "\"

The file was saved as UTF-8, so this is surprising to me.

make.names is consistent with the results above:

> make.names('µ')
[1] "µ"
> make.names('α')
[1] "a"

What is the rule for non-ASCII letters, why are mu and scharfes OK but alpha isn't?

Edit: Output of sessionInfo()

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.3 tools_3.4.3    yaml_2.2.0 

Edit2: It seems like Sys.setlocale should be the answer, but here is what happens when I try this:

> Sys.setlocale("LC_ALL", 'en_US.UTF-8')
[1] ""
Warning message:
In Sys.setlocale("LC_ALL", "en_US.UTF-8") :
  OS reports request to set locale to "en_US.UTF-8" cannot be honored
Josh Rumbut
  • 2,640
  • 2
  • 32
  • 43
  • it probably has to do with which ones are in the [Windows code page](https://en.wikipedia.org/wiki/Windows_code_page); can you show the results of `sessionInfo()`, which might have some information on that? – Ben Bolker Aug 25 '18 at 18:47
  • For what it's worth, you'll drive typography nerds crazy by using ß for beta ... – Ben Bolker Aug 25 '18 at 18:47
  • @BenBolker I just corrected it, I used an incorrect list of ALT codes for this – Josh Rumbut Aug 25 '18 at 18:48
  • @BenBolker I added `sessionInfo()`, I guess I see where this is going but having the option to save the file as UTF-8 then interpreting it as Win-1252 seems like a bug? – Josh Rumbut Aug 25 '18 at 18:51
  • I don't really know -- I rarely use Windows. Just trying to guess the direction in which you should look for an answer. – Ben Bolker Aug 25 '18 at 18:54
  • Likewise, and I guess now we see why! Windows-1252 has mu and scharfes and not the others. If you want to add that as an answer I will accept it unless someone has some crazy other solution. – Josh Rumbut Aug 25 '18 at 19:01
  • fine with me if you write it up. Would be nice to get a definitive answer about whether there's a way to deal with this (your question asks "why are some letters OK and others not?" -- we know the answer to that, but not to the more interesting "is there a way to make R handle this?" question ...) – Ben Bolker Aug 25 '18 at 19:09

1 Answers1

2

Working with Ben Bolker we determined the issue was that the current session was using character encoding Windows-1252, which has some non-ASCII characters but not many. This is despite the fact that RStudio saved the file as UTF-8.

Attempting to change the current collation of a running R session does not seem to be possible? At least on Windows I get a warning (see the question and here).

I have a partial solution, if someone finds themselves in the situation where they are given a file like this and want to run it and have interactive access to the results, the following will mostly work (variables will be translated to Win-1252):

> source('utf-8-file.r', encoding='UTF-8')

I would be very excited to see a better solution, one which allows editing and running the file and entering such snippets into the console of RStudio on Windows.

Josh Rumbut
  • 2,640
  • 2
  • 32
  • 43