1

I want to use unicode superscript and subscripts for variable names for my math homework as it's easier to relate the variables to math, like σ² for variance, μ₁, μ₂ for means and so on. Also because they look nicer (than let's say μ_1 or μ1).

So I can make unicode variables in R, like

μ = 2

works fine, but unicode superscript and subscript doesn't work, something like.

μ₂ = 3

will give me an error:

> μ₂ = 3
Error: unexpected input in "μ₂"

Output of make.names gives me:

> make.names("μ₂")
[1] "μ."

Which is similar to how special symbols like + - are replaced in make.names, so I thought maybe ² or ₂ have special meanings, but they don't seem to.

> ²
Error: unexpected input in "²"

Here are some related questions about unicode, but they aren't exactly what I want

My files are saved as utf-8, and Sys.getenv() gives me LANG en_US.UTF-8 and even if I do Sys.setlocale("LC_ALL", 'en_US.UTF-8') (from another question), it doesn't change anything.

Atreyagaurav
  • 1,145
  • 6
  • 15
  • Those probably aren't valid characters in R identifiers. – Shawn Apr 20 '22 at 21:59
  • That's confusing since for something to be non valid character it needs to have other meanings right? Can't be invalid just for nothing? since it is same as any other unicode character. Is there nothing we can do? – Atreyagaurav Apr 20 '22 at 22:08
  • See https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Identifiers – Shawn Apr 20 '22 at 23:00
  • @Atreyagaurav: they are not the same as other unicode characters. Unicode characters come in many categories and properties. There is a subset about "identifiers". In any case, do not use Unicode subscripts and superscripts. They exists only for compatibility of some old charset, but they are a second class citizens in Unicode. Unicode is not about formatting. – Giacomo Catenazzi Apr 21 '22 at 07:08
  • @Shawn `printf("%d\n", isalnum('₂'));` gives me segmentation error, and when I look at the `printf("%d\n", 'μ');⇒52924` (`isalnum('μ')` is True) `compared to `printf("%d\n", '²');⇒49842` (`isalnum('²')` is False), `printf("%d\n", '₂')⇒14844546`, so I assume it only supports 8 bit char. and the μ, σ only worked due to them being in ASCII range and considered letters. Which looks like unlike in Julia where you can use variable names with unicodes, R doesn't support unicodes fully but just happened to support some of them by change and we're not supposed to use them. – Atreyagaurav Apr 22 '22 at 13:40
  • Sorry it's probably 16 bit not 8 bit, but I hope you get what I want to say. – Atreyagaurav Apr 22 '22 at 13:48

0 Answers0