ggplot: aes vs aes_string, or how to programmatically specify column names?

Question

Let's assume we have the following data frame

data <- data.frame(time=1:10, y1=runif(10), y2=runif(10), y3=runif(10))

and we want to create a plot like this:

p <- ggplot(data, aes(x=time))
p <- p + geom_line(aes(y=y1, colour="y1"))
p <- p + geom_line(aes(y=y2, colour="y2"))
p <- p + geom_line(aes(y=y3, colour="y3"))
plot(p)

enter image description here

But what if we have much more "y" columns, and we do not know their exact name. This raises the question: How can we iterate over all columns programmatically, and add them to the plot? Basically the goal is:

otherFeatures <- names(data)[-1]
for (f in otherFeatures) {
  # what goes here?
}

Failed Attempts

So far I have found many ways that do not work. For instance (all following examples only show the code line in the above for loop):

My first try was simply to use aes_string instead of aes in order to specify the column name by the loop variable f:

p <- p + geom_line(aes_string(y=f, colour=f))

But this does not give the same result, because now colour will not be a fixed color for each line (aes_string will interpret f in the data frame environment). As a result, the legend will become a color bar, and does not contain the different column names. My next guess was to mix aes and aes_string, trying to set colour to a fixed string:

p <- p + geom_line(aes_string(y=f), aes(colour=f))

But this results in Error: ggplot2 doesn't know how to deal with data of class uneval. My next attempt was to use colour "absolutely" (not within aes) like this:

p <- p + geom_line(aes_string(y=f), colour=f)

But this gives Error: invalid color name 'y1' (and I don't want to pick some proper color names manually either). The next try was to go back to aes only, replicating the manual approach:

p <- p + geom_line(aes(y=data[[f]], colour=f))

This does not give an error, but will only plot the last column. This makes sense, since aes will probably call substitute, and the expression will always be evaluated with the last value of f in the loop (rm f before calling plot(p) gives an error, indicating that the evaluation happens after the loop).

To rephrase the question: What kind of substitute/eval/quote magic is necessary to replicate the simple code from above within a for loop?

melt the data then plot? http://stackoverflow.com/questions/17150183/r-plot-multiple-lines-in-one-graph — rawr, Nov 12 '14 at 21:00
@rawr: This is only an example problem showing the general issues I have with `aes` vs `aes_string`. So I'm actually not looking for a work-around to avoid the for loop. My goal is to understand how to write such a loop in general. — bluenote10, Nov 13 '14 at 08:58
how about `p <- ggplot(data, aes(x = time));for (ii in names(data[, -1])) {col <- palette()[which(names(data) %in% ii)]; p <- p + geom_line(aes_string(y = ii), colour = col)}; p` then — rawr, Nov 13 '14 at 15:00
@rawr: Thanks, this is interesting. But it also does not achieve full equivalence to the unrolled loop (there is no legend and the manual use of palette() seems to lead to different colors). This obviously can be fixed, by my goal here is rather to learn something about non-standard evaluation: Given an unrolled loop, how is it possible to produce the exact same behavior in an actual loop? I was hoping that it is possible to "hide" the expressions in my last example from NSE. — bluenote10, Nov 13 '14 at 15:38

score 5 · Accepted Answer · edited Jun 14 '16 at 13:24

This is old now but in case anyone else comes across it, I had a very similar problem that was driving me crazy. The solution I found was to pass aes_q() to geom_line() using the as.name() option. You can find details on aes_q() here. Below is the way I would solve this problem, though the same principle should work in a loop. Note that I add multiple variables with geom_line() as a list here, which generalizes better (including to one variable).

varnames <- c("y1", "y2", "y3")
add_lines <- lapply(varnames, function(i) geom_line(aes_q(y = as.name(i), colour = i)))

p <- ggplot(data, aes(x = time))
p <- p + add_lines
plot(p)

Hope that helps!

blakeoft · Answer 2 · 2014-11-12T21:10:01.630

3

You could melt (thanks for reminding me of this function, rawr) all of your data into a few columns. For example, it could look like this:

library(reshape2)    
data2 <- melt(data, id = "time")
head(data2)
#    time variable       value
# 1     1       y1 0.353088575
# 2     2       y1 0.621565368
# 3     3       y1 0.696031085
# 4     4       y1 0.507112969
# 5     5       y1 0.009560710
# 6     6       y1 0.158993988
ggplot(data2, aes(x = time, y = value, color = variable)) + geom_line()

enter image description here

edited Nov 12 '14 at 21:10

answered Nov 12 '14 at 21:02

blakeoft

2,370
1
14
15

1

I think this is essentially what rawr meant by suggesting `melt`. – blakeoft Nov 12 '14 at 21:04
For the record, `melt(...)` is in the `reshape2` package. This is the idiomatic way: `library(reshape2); gg <- melt(data,id="time"); ggplot(gg,aes(x=time,y=value,color=variable))+geom_line()`. – jlhoward Nov 12 '14 at 21:08
2

Thanks! This is a very nice work-around for this particular problem. However, I ran into this "mixing of `aes` and `aes_string`" problem in different situations, where melting is not necessarily applicable. Therefore I'm still looking for a solution to write this simple for loop. – bluenote10 Nov 13 '14 at 08:55

score 1 · Answer 3 · answered Apr 07 '15 at 23:01

NOTE: This is not really an answer, just a very partial explanation of what is going on behind the scenes that might set on you on the right track. I have to admit my understanding of NSE is still very basic.

I have struggled and am still struggling with this particular issue. I have narrowed down the issue to NSE. I am not familiar with R's native substitute/quote/eval stuff, so I am going to demonstrate using the lazyeval package.

library(lazyeval)

a <- lapply(c(1:9,13), function(i) lazy(i))

head(a)
# [[1]]
# <lazy>
#   expr: c(1, 2, 3, 4, 5, 6, 7, 8, 9, 13)[[10L]]
#   env:  <environment: 0x25889a00>
# 
# [[2]]
# <lazy>
#   expr: c(1, 2, 3, 4, 5, 6, 7, 8, 9, 13)[[10L]]
#   env:  <environment: 0x25889a00>
#
# ...........

lazy_eval(a[[1]])
# [1] 13

lazy_eval(a[[2]])
# [1] 13

I think this happens because lazy(i) binds to the promise of i. By the time we get to evaluating any of these i evaluations, i is whatever was LAST assigned to it -- in this case, 13. Perhaps this is due to the environment in which i is evaluated being shared over all iterations of the lapply function?

I have had to resort to the same workarounds as you through aes_string and aes_q. I found them quite unsatisfactory as they neither (1) fully consistent with aes behavior and (2) particularly clean. Oh, the joys of learning NSE ;)

You can find the source code of the + and aes operators here:

ggplot2:::`+.gg`
ggplot2:::aes
ggplot2:::aes_q
ggplot2:::aes_string

ggplot: aes vs aes_string, or how to programmatically specify column names?

Failed Attempts

3 Answers3

Linked