4

In the question found here OP asks a rather simple question, which according to my belief has a simple answer. However, OP rejects this as being too verbose, and this caught my interest in the question whether ggplot2 is unnecessarily more complex than base R.

Lets say I want to produce the following plot,

enter image description here

I can do this is base R with the following script,

# Grid of X-axis values
x <- seq(0, 10, 0.01)

# Data
set.seed(1)
y1 <- 2 * cos(x) + 8
y2 <- 3 * sin(x) + 4

# Lines
plot(x, y1, type = "l",
     ylim = c(1, 10), ylab = "y")
lines(x, y2, type = "l", col = 2)

# Fill area between lines
polygon(c(x, rev(x)), c(y2, rev(y1)),
        col = "#6BD7AF")

Which equivalently can be expressed by ggplot2-syntax with the following script,

library(tidyverse)

x <- seq(0, 10, 0.01)

# Data
set.seed(1)
y1 <- 2 * cos(x) + 8
y2 <- 3 * sin(x) + 4



sample_data <- tibble(
        x,
        y1,
        y2
)

sample_data %>% 
        ggplot() + 
        geom_ribbon(
                aes(
                x =x, ymin = y1, ymax = y2
                ),
                fill = "#6BD7AF",
                color = "black"
        ) + theme_classic()

In this discussion about the %>%-operator there is mention of some overhead costs, with an suitable example. So I compared the two methods with microbenchmark, to see whether this also was the case here. The results are found below,

Unit: milliseconds
   expr       min        lq      mean    median        uq       max neval
 ggplot  2.836232  3.013435  3.350137  3.188025  3.388644  9.357495   100
  baseR 20.090868 58.090015 58.993941 59.568097 60.235889 65.577104   100

On average ggplot outruns base R by far. The script below produces these results,

library(microbenchmark)


base_foo <- function() {
        
        # Grid of X-axis values
        x <- seq(0, 10, 0.01)
        
        # Data
        set.seed(1)
        y1 <- 2 * cos(x) + 8
        y2 <- 3 * sin(x) + 4
        
        # Lines
        plot(x, y1, type = "l",
             ylim = c(1, 10), ylab = "y")
        lines(x, y2, type = "l", col = 2)
        
        # Fill area between lines
        polygon(c(x, rev(x)), c(y2, rev(y1)),
                col = "#6BD7AF")
        
}


ggplot_foo <- function() {
        
        # Grid of X-axis values
        x <- seq(0, 10, 0.01)
        
        # Data
        set.seed(1)
        y1 <- 2 * cos(x) + 8
        y2 <- 3 * sin(x) + 4
        
        
        sample_data <- tibble(
                x,
                y1,
                y2
        )
        
        sample_data %>% 
                ggplot() + 
                
                geom_ribbon(
                        
                        aes(
                                x =x, ymin = y1, ymax = y2
                        ),
                        fill = "#6BD7AF",
                        color = "black"
                ) + theme_classic()
        
        
        
}


microbenchmark(
        ggplot = ggplot_foo(),
        baseR  = base_foo()
)

ggplot2 has the advantage of being faster in this specific application, but is there cases where it would make more sense to use base R?

Namely,

  1. Is one or the other more memory-efficient than the other?
  2. Is one or the other more compliant to good coding practise, ie. does it have easier readability across languages?

In summary, ggplot is faster. But when would base R be preferable to ggplot when dealing with, say, plots for presentations and policy recommendations.

Serkan
  • 1,855
  • 6
  • 20
  • 1
    I suspect this will get closed as off-topic, but FWIW the main reason I can think of to use base `R` is that installing packages (or software that those packages is dependent on) is difficult/impossible, for example on a server environment. – Phil Aug 14 '21 at 18:55
  • Yeah - Im suspecting that too... I have a hard time focusing this to be honest.... what you say makes sense! But how about the 'universal' rules of coding? Like, it is good practise to add extra white space after `{` - can we say that the syntax follows good practise? – Serkan Aug 14 '21 at 18:57
  • 1
    To stop this from being closed as opinion-based, you should define 'better'. Faster? More memory efficient? etc. – user438383 Aug 14 '21 at 19:03
  • I gave it a shot - feel free to edit it, if you want! :-) – Serkan Aug 14 '21 at 19:13
  • 2
    `ggplot2` (with non-trivial plots) is almost always slower, sometimes by an order of magnitude. Everything one can do with `ggplot2` (with its feature-full premise of aesthetics, geometries, layers, faceting, etc) can be done in base graphics, but at a cost of manually controlling all of the components that go into it. For instance, one can do `aes(color=z)` and if `z` is categorical or if `z` is numeric, then ggplot2 does the right thing; in base graphics, the programmer is responsible for enumerating and/or creating gradients scaled to the data in question. – r2evans Aug 14 '21 at 19:18
  • In my experience base r is a susbtantial amount faster than ggplot2 for e.g. large scatter plots with over 1m points. – user438383 Aug 14 '21 at 19:24
  • 2
    To better inform your question about "memory-efficient", the `bench::mark` function tracks memory allocations, so it should be at least somewhat informative. *"Good coding practice"* is never going to dictate `ggplot2` over base-graphics; similarly, *"readability"* is completely subjective. For example, `data.table` is effectively another dialect of R that I had the hardest time reading and therefore resisted using it, despite the fact that it was significantly faster and memory-efficient. Once I learned it, I find its readability rather intuitive. (Eye-of-the-beholder.) – r2evans Aug 14 '21 at 21:51
  • (To be clear, *"Good coding practice"* should never dictate one over the other. "Good" depends on the requirements and proficiency.) – r2evans Aug 14 '21 at 22:04
  • 3
    A small note on the benchmarking: the base R code prints the plot to the device, whereas your ggplot2 code doesn't draw the plot, leading to an underestimate of ggplot2's time. You're right that ggplot2 is more complex. I find the main benefits of ggplot2 the structured grammar and automatically expanding colour and position scales, legend positioning as well as an intuitive way of facetting the data. In addition, ggplot2's extensions are quite rich. I mostly use base R plots when I quickly need to plot two variables against oneanother during exploration. – teunbrand Aug 15 '21 at 13:12

0 Answers0