In the question found here OP asks a rather simple question, which according to my belief has a simple answer. However, OP rejects this as being too verbose, and this caught my interest in the question whether ggplot2
is unnecessarily more complex than base R
.
Lets say I want to produce the following plot,
I can do this is base R
with the following script
,
# Grid of X-axis values
x <- seq(0, 10, 0.01)
# Data
set.seed(1)
y1 <- 2 * cos(x) + 8
y2 <- 3 * sin(x) + 4
# Lines
plot(x, y1, type = "l",
ylim = c(1, 10), ylab = "y")
lines(x, y2, type = "l", col = 2)
# Fill area between lines
polygon(c(x, rev(x)), c(y2, rev(y1)),
col = "#6BD7AF")
Which equivalently can be expressed by ggplot2
-syntax with the following script
,
library(tidyverse)
x <- seq(0, 10, 0.01)
# Data
set.seed(1)
y1 <- 2 * cos(x) + 8
y2 <- 3 * sin(x) + 4
sample_data <- tibble(
x,
y1,
y2
)
sample_data %>%
ggplot() +
geom_ribbon(
aes(
x =x, ymin = y1, ymax = y2
),
fill = "#6BD7AF",
color = "black"
) + theme_classic()
In this discussion about the %>%
-operator there is mention of some overhead costs, with an suitable example. So I compared the two methods with microbenchmark
, to see whether this also was the case here. The results are found below,
Unit: milliseconds
expr min lq mean median uq max neval
ggplot 2.836232 3.013435 3.350137 3.188025 3.388644 9.357495 100
baseR 20.090868 58.090015 58.993941 59.568097 60.235889 65.577104 100
On average ggplot
outruns base R
by far. The script
below produces these results,
library(microbenchmark)
base_foo <- function() {
# Grid of X-axis values
x <- seq(0, 10, 0.01)
# Data
set.seed(1)
y1 <- 2 * cos(x) + 8
y2 <- 3 * sin(x) + 4
# Lines
plot(x, y1, type = "l",
ylim = c(1, 10), ylab = "y")
lines(x, y2, type = "l", col = 2)
# Fill area between lines
polygon(c(x, rev(x)), c(y2, rev(y1)),
col = "#6BD7AF")
}
ggplot_foo <- function() {
# Grid of X-axis values
x <- seq(0, 10, 0.01)
# Data
set.seed(1)
y1 <- 2 * cos(x) + 8
y2 <- 3 * sin(x) + 4
sample_data <- tibble(
x,
y1,
y2
)
sample_data %>%
ggplot() +
geom_ribbon(
aes(
x =x, ymin = y1, ymax = y2
),
fill = "#6BD7AF",
color = "black"
) + theme_classic()
}
microbenchmark(
ggplot = ggplot_foo(),
baseR = base_foo()
)
ggplot2
has the advantage of being faster in this specific application, but is there cases where it would make more sense to use base R
?
Namely,
- Is one or the other more
memory
-efficient than the other? - Is one or the other more compliant to good coding practise, ie. does it have easier readability across languages?
In summary, ggplot
is faster. But when would base R
be preferable to ggplot
when dealing with, say, plots
for presentations and policy recommendations.