1

My query is with reference to this reprex:

d1 <- data.frame(index= 1:100,x=1:100,x_hat= 1:100+ rnorm(100))

ggplot(data = d1 ) + 
 geom_line(aes(x=index,y=x,color="True X")) +  
 geom_line(aes(x=index,y=x_hat,color="Estimated X")) + 
 scale_x_continuous(name = "" ) + 
 ylab("")

The code is doing what I want it to do but I don't know how it is doing it. When I say color = "True X" I think it is generating a variable on the fly which is a constant.

Is that correct ? How is it working ? Can someone say a few words on this ? The beauty of this approach is that it automatically creates a correct legend.

Roman
  • 17,008
  • 3
  • 36
  • 49
user2338823
  • 501
  • 1
  • 3
  • 16
  • I wrote a post that is closely related to this [here](https://aosmith.rbind.io/2018/07/19/legends-constants-for-aesthetics-in-ggplot2/), which you might be interested in. – aosmith Aug 02 '18 at 15:21

1 Answers1

0

Your intuition is basically correct. Specifying a string constant within each geom_line is telling ggplot2 to draw the line in a default color and add a legend labelled with whatever string you specified after color =. If you specified the same string in both geoms (e.g. color = "True X"), you would get only one line in that reddish default color to go along with a legend with only one label. So in other words, each unique string constant is telling ggplot to draw the respective line in a different color and add a label to the legend.

If you want to customize further, you can add scale_color_manual to your call to ggplot. For instance, scale_color_manual("Type of X", values = c("blue", "red")) would add a proper title to the legend and change the colors of the two lines to whatever you want (in this case blue and red).

tifu
  • 1,352
  • 6
  • 17
  • It might also be worth noting that your approach hacks around ggplots preference for tidy data. If you gathered your data to create a tidy data frame (e.g. via `d1 %>% gather(type.of.x, value, 2:3) -> d2`, you can avoid having to specify two separate calls to `geom_line` (like so: `ggplot(d2, aes(x=index, y = value, color = type.of.x)) + geom_line() + scale_color_manual("Type of X", labels = c("Estimated X", "True X"), values = c("blue", "red"))` – tifu Aug 02 '18 at 06:48
  • So ggplot is generating the matrix ( which will be like the example of gather you have given ) to be plotted on the fly ? – user2338823 Aug 02 '18 at 08:52
  • No I don't think this is what happens. It simply plots your two specified layers separately and assigns default colors to the two strings you indicated in `color = `. You can (to an extent) assess what us going on internally by saving your ggplot object to the environment and investigate it using `View(yourggobject)` – tifu Aug 02 '18 at 10:07
  • 1
    Hi, I am able to see the data.frame built on the fly by using ggplot_build(ggplot(data = d1 ) + geom_line(aes(x=index,y=x,color="True X")) + geom_line(aes(x=index,y=x_hat,color="Estimated X")) + scale_x_continuous(name = "" ) + ylab("")) – user2338823 Aug 02 '18 at 11:31
  • That is a cool feature! Notice how there are two lists of data frame in this object, but when `gather()` is applied before hand there is only one. So, using your approach the plot is drawn from two separate dfs whereas when `gather()` was applied, one df suffices. – tifu Aug 02 '18 at 12:39