2

I have question similar to this one about the use of multiple dataframes for plotting a ggplot. I would like to create a base plot and then add data using a list of dataframes (rationale/usecase described below).

library(ggplot2)

# generate some data and put it in a list
df1 <- data.frame(p=c(10,8,7,3,2,6,7,8),v=c(100,300,150,400,450,250,150,400))
df2 <- data.frame(p=c(10,8,6,4), v=c(150,250,350,400))
df3 <- data.frame(p=c(9,7,5,3), v=c(170,200,340,490))
l <- list(df1,df2,df3)

#create a layer-adding function
addlayer <-function(df,plt=p){
  plt <- plt + geom_point(data=df, aes(x=p,y=v))
  plt
}

#for loop works
p <- ggplot()
for(i in l){
  p <- addlayer(i)
}

#Reduce throws and error
p <- ggplot()
gg <- Reduce(addlayer,l)
Error in as.vector(x, mode) : 
  cannot coerce type 'environment' to vector of type 'any'
Called from: as.vector(e2)

In writing out this example I realize that the for loop is not a bad option but wouldn't mind the conciseness of Reduce, especially if I want to chain several functions together.

For those who are interested my use case is to draw a number of unconnected lines between points on a map. From a reference dataframe I figured the most concise way to map was to generate a list of subsetted dataframes, each of which corresponds to a single line. I don't want them connected so geom_path is no good.

Community
  • 1
  • 1
zach
  • 29,475
  • 16
  • 67
  • 88

1 Answers1

7

This seems to work,

addlayer <-function(a, b){
  a + geom_point(data=b, aes(x=p,y=v))
}

Reduce(addlayer, l, init=ggplot())

Note that you can also use a list of layers,

ggplot() + lapply(l, geom_point, mapping = aes(x=p,y=v))

However, neither of those two strategies is to be recommended; ggplot2 is perfectly capable of drawing multiple unconnected lines in a single layer (using e.g. the group argument). It is more efficient, and cleaner code.

names(l) = 1:3
m = ldply(l, I)
ggplot(m, aes(p, v, group=.id)) + geom_line() 
baptiste
  • 75,767
  • 19
  • 198
  • 294
  • thanks @baptiste. This answer works great and I didn't even consider adding a delimeter group column to my dataframe. That may indeed be the best choice. – zach Jun 05 '14 at 13:08
  • after trying to implement a column that would hold information on line grouping, I encountered an issue. Because point can be connected to a variable number of other points, they will not belong to a single group. While not ideal, I think the multiple layer thing may be the way to go. Is there a particular reason you counsel against the multiple layer approach? – zach Jun 05 '14 at 14:49
  • 1
    I have general reasons, stated above, but most rules have their exceptions. It's hard to tell without an illustrative example (hint). – baptiste Jun 05 '14 at 16:17