In R, both ggplot2
and lattice
package provide possibilities to visualize data not only by their x
and y
position but also considering an additional factor, changing the color, size or shape of the observation representation (point, smooth line, etc.) or splitting the visualization into separate diagrams along this factor.
Example for ggplot
:
require(ggplot2)
ggplot(diamonds, aes(x = carat, y = price, col=clarity)) +
geom_point(alpha = .3)
Example for lattice
:
require(lattice)
require(mlmRev); data(Chem97, package = "mlmRev")
densityplot(~ gcsescore | factor(score), Chem97, groups = gender,
plot.points = FALSE, auto.key = TRUE)
Obviously, these really easy ways of differentiating data by another factor are created for the use with one single dataframe, containing all observations to be shown. However, I more often do have separate data inputs, in the form of separate dataframes, containing different columns to be represented as x and y. The third factor for a separation in the plot then would be the dataframe resp. data source itself. The only solution for this, I was able to find so far, is merging all data into one dataframe and previously adding another column to each source dataframe, only containing the third factor, resp. the data source (so in each cell of this column there is the same string expression). Finally ggplot2 and lattice are then able to separate the data again by this third factor and visalize them separated, as wished.
Now to the final problem: This seems to be a really poor workflow and is not very efficient for bigger amounts of data. Is there perhaps an alternative way to achieve the same result or at least a way to efficiently automate the last described workflow?