Correct display of chemical formulae in ggplot axis category labels

Question

I'm plotting a data set with chemical formulae as categories, and values associated with each:

data <- data.frame(compound = factor(c("SiO[2]", "Al[2]O[3]", "CaO")),
                value = rnorm(3, mean = 1, sd = 0.25))

I want to get the subscripts in the chemical formulae to display correctly in the axis labels. I've tried various solutions involving bquote(), label_parsed(), scales::parse_format() and ggplot2:::parse_safe (as per this thread), but all of those give me either no category labels at all or a mess. For example:

ggplot(data = data, aes(x = compound, y = value)) +
geom_col() +
scale_x_discrete(labels = scales::parse_format())

Gives this error message:

Error in parse(text = x, srcfile = NULL) : 1:6: unexpected symbol
1: Al[2]O
         ^

Can anyone help? I've done this successfully before with the x axis and x-axis labels (via labs() and then bquote() or similar), and there are various threads I can see for that problem, but the same solutions don't seem to work for category labels.

Just recently wrote a pretty long `bquote()` answer [here](https://stackoverflow.com/questions/58924729/plotting-sma-regressions-lines-smatr-package-into-ggplot). Maybe that will help? I think you'll want to isolate the `bquote()` stuff just to the `labs()` label call. — ravic_, Nov 21 '19 at 21:44
Oh, apologies, just noticed you tried `labs()` and `bquote()` already. If you can include sample data using the `dput()` function, that will help me and others answer your question. — ravic_, Nov 21 '19 at 21:47
Thanks @ravic_. That example you linked to is a little different as the non-standard characters are in the axis label itself, rather than the categories along the axis. I think the dummy data in my question is enough to reproduce the problem, but let me know if I'm missing something! — TimM, Nov 21 '19 at 21:51
Harder than it looks, but think I have an workable solution for you now. — ravic_, Nov 21 '19 at 23:44

ravic_ · Accepted Answer · 2019-11-22T03:50:52.773

UPDATED: Finally got the right parse() routine, so that if the chemicals are formatted correctly already in the dataframe, then they can simply be parsed to show the proper labels. (Note that aluminum oxide needs the tilde (~) character).

library(tidyverse)
library(rlang)
#> 
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#> 
#>     %@%, as_function, flatten, flatten_chr, flatten_dbl, flatten_int,
#>     flatten_lgl, flatten_raw, invoke, list_along, modify, prepend,
#>     splice
compounds = c("SiO[2]", "Al[2]~O[3]", "CaO[1]")
data <- tibble(compound = compounds,
               value = rnorm(3, mean = 1, sd = 0.25))
data %>%
  ggplot(aes(x = compound, y = value)) +
  geom_col() +
  scale_x_discrete(labels = rlang::parse_exprs)

^{Created on 2019-11-21 by the reprex package (v0.3.0)}

PREVIOUS UPDATE: Replacing the code with something slightly more extensible with a translation table to obtain the bquote() expressions. Same basic idea, but not just hard-wiring in the labels now, so should work with filters, facets, etc.


library(tidyverse)
compounds = c("SiO[2]", "Al[2]O[3]", "CaO[1]")
translation = c("SiO[2]" = bquote(SiO[2]),
                "Al[2]O[3]" = bquote(Al[2] ~ O[3]),
                "CaO[1]" = bquote(CaO))
data <- tibble(compound = compounds,
               value = rnorm(3, mean = 1, sd = 0.25))
ggplot(data = data, aes(x = compound, y = value)) +
  geom_col() + 
  scale_x_discrete(labels = translation)

Thanks for that! Does the trick. Would still be interested in something that does this more systematically (for when there are more than a few formulae involved than in my little reprex), if you/anyone has any bright ideas. But for now, problem solved. Thanks again. — TimM, Nov 21 '19 at 23:43
Okay @TimM, this was bugging me, and even though it sort of worked before, now the last update actually does what you want -- which is to parse the expressions that you already have maintained in your dataframe. Phew. Not doing that again. :) — ravic_, Nov 22 '19 at 03:51
Using "Al[2]~O[3]" adds space, so it is better to use "Al[2]*O[3]" instead. — Pedro J. Aphalo, Aug 15 '22 at 12:06

Correct display of chemical formulae in ggplot axis category labels

1 Answers1