Combining different data sources into one ggplot or lattice diagram

Question

In R, both ggplot2 and lattice package provide possibilities to visualize data not only by their x and y position but also considering an additional factor, changing the color, size or shape of the observation representation (point, smooth line, etc.) or splitting the visualization into separate diagrams along this factor.

Example for ggplot:

require(ggplot2)
ggplot(diamonds, aes(x = carat, y = price, col=clarity)) +
  geom_point(alpha = .3)

Example for lattice:

require(lattice)
require(mlmRev); data(Chem97, package = "mlmRev")
densityplot(~ gcsescore | factor(score), Chem97, groups = gender,
            plot.points = FALSE, auto.key = TRUE)

Obviously, these really easy ways of differentiating data by another factor are created for the use with one single dataframe, containing all observations to be shown. However, I more often do have separate data inputs, in the form of separate dataframes, containing different columns to be represented as x and y. The third factor for a separation in the plot then would be the dataframe resp. data source itself. The only solution for this, I was able to find so far, is merging all data into one dataframe and previously adding another column to each source dataframe, only containing the third factor, resp. the data source (so in each cell of this column there is the same string expression). Finally ggplot2 and lattice are then able to separate the data again by this third factor and visalize them separated, as wished.

Now to the final problem: This seems to be a really poor workflow and is not very efficient for bigger amounts of data. Is there perhaps an alternative way to achieve the same result or at least a way to efficiently automate the last described workflow?

There is no need to merge all data into a single data.frame. Each layer/geom_ of ggplot object can accept new data provided by `data = ` and have layer-specific of x, y, ... through `mapping = aes(...)` in this layer. — mt1022, Apr 06 '17 at 13:42
see this example: http://stackoverflow.com/questions/9109156/ggplot-combining-two-plots-from-different-data-frames — mt1022, Apr 06 '17 at 13:43

score 0 · Answer 1 · answered Apr 06 '17 at 13:58

It's usually a good idea to merge more data source into one when working with ggplot. There are of course exception to this, and ggplot give the tools to deal with this cases.

That said, it's possible to pass the data argument to each geom_*

A general rule I use is that if different data sources are used in the same geom_* then they have to be combined, if they will be used in different geom_s, they can (and maybe should) stay separate.

Bind data sources to use in the same `geom_*`

df1 <- data.frame(group = LETTERS[1:3],
                  obs = runif(3))

df2 <- data.frame(group = LETTERS[1:3],
                  obs = runif(3))

library(purrr)
dfT <- list(df1 = df1, df2 = df2) %>% 
    map_df(~rbind(.x), .id = 'src')


library(ggplot2)
ggplot(dfT, aes(x = group, y = obs)) +
    geom_line(aes(group = src, color = src), size = 1)

Use different data sources

df1 <- data.frame(group = LETTERS[1:3],
                  hValue = runif(3))

df2 <- data.frame(group = rep(LETTERS[1:3], each = 3),
                  pValue = runif(9))
library(ggplot2)
ggplot() +
    geom_line(data = df1, aes(x = group, y = hValue, group = 1), size = 1) +
    geom_point(data = df2, aes(x = group, y = pValue, color = group))

Combining different data sources into one ggplot or lattice diagram

1 Answers1

Bind data sources to use in the same geom_*

Use different data sources

Bind data sources to use in the same `geom_*`