Magrittr pipe in R functions

Question

Are there cases where it is not advantageous to use the magrittr pipe inside of R functions from the perspectives of (1) speed, and (2) ability to debug effectively?

Are you asking about execution speed or speed in terms of development time? — Dason, Jul 17 '17 at 17:28
Either way there are some people that would ask you if there are any cases where using magrittr pipes are advantageous in terms of speed or debug ability... — Dason, Jul 17 '17 at 17:29
The only way to answer this is to set up several examples and benchmark them. Have you tried doing that? — Hack-R, Jul 17 '17 at 17:34
I don't use pipes. So use of pipes inside a function means I won't debug it. I can't judge if that counts as a disadvantage to you. I believe an R-core member once called code with magrittr pipes "mumbo jumbo" or something like that in an email on the R-devel list. — Roland, Jul 17 '17 at 17:36

score 7 · Accepted Answer · edited Jun 20 '20 at 09:12

There are advantages and disadvantages to using a pipe inside of a function. The biggest advantage is that it's easier to see what's happening within a function when you read the code. The biggest downsides are that error messages become harder to interpret and the pipe breaks some of R's rules of evaluation.

Here's an example. Let's say we want to make a pointless transformation to the mtcars dataset. Here's how we could do that with pipes...

library(tidyverse)
tidy_function <- function() {
  mtcars %>%
    group_by(cyl) %>%
    summarise(disp = sum(disp)) %>%
    mutate(disp = (disp ^ 4) / 10000000000)
}

You can clearly see what's happening at every stage, even though it's not doing anything useful. Now let's look at the time code using the Dagwood Sandwich approach...

base_function <- function() {
  mutate(summarise(group_by(mtcars, cyl), disp = sum(disp)), disp = (disp^5) / 10000000000)
}

Much harder to read, even though it gives us the same result...

all.equal(tidy_function(), base_function())
# [1] TRUE

The most common way to avoid using either a pipe or a Dagwood Sandwich is to save the results of each step to an intermediate variable...

intermediate_function <- function() {
  x <- mtcars
  x <- group_by(x, cyl)
  x <- summarise(x, disp = sum(disp))
  mutate(x, disp = (disp^5) / 10000000000)
}

More readable than the last function and R will give you a little more detailed information when there's an error. Plus it obeys the traditional rules of evaluation. Again, it gives the same results as the other two functions...

all.equal(tidy_function(), intermediate_function())
# [1] TRUE

You specifically asked about speed, so let's compare these three functions by running each of them 1000 times...

library(microbenchmark)
timing <-
  microbenchmark(tidy_function(),
                 intermediate_function(),
                 base_function(),
                 times = 1000L)
timing
#Unit: milliseconds
                    #expr      min       lq     mean   median       uq       max neval cld
         #tidy_function() 3.809009 4.403243 5.531429 4.800918 5.860111  23.37589  1000   a
 #intermediate_function() 3.560666 4.106216 5.154006 4.519938 5.538834  21.43292  1000   a
         #base_function() 3.610992 4.136850 5.519869 4.583573 5.696737 203.66175  1000   a

Even in this trivial example, the pipe is a tiny bit slower than the other two options.

Conclusion

Feel free to use the pipe in your functions if it's the most comfortable way for you to write code. If you start running into problems or if you need your code to be as fast as humanly possible, then switch to a different paradigm.

Your Dagwood sandwich just needs some linebreaks and indentions and its very readable. — Roland, Jul 17 '17 at 19:34
Does anyone know any packages where the pipe operator is used? I haven't seen it used except in script examples. — user2506086, Jul 17 '17 at 21:20
To me it seems like piping is closer to how you would write the instructions to do something, so I'd like to use it in cases where I want my code to be very readable. On the other hand I get the difficulty debugging. If you try to debug into a pipe you have to make your way through the `%>%` function, and figure out which line to step into. Seems like if I write a package for other folks to use, whether I use the pipe is going to be a balance between how natural I want it to be to read the code, and how easy I want it to be to debug the code. — user2506086, Jul 17 '17 at 21:38
Your question was about using the pipe in functions, not necessarily packages. I use the pipe in my functions and I even use it in my packages, but only in cases where I don't anticipate getting any errors. If you want to see how packages use the pipe, you can search Github: https://github.com/search?utf8=%E2%9C%93&q=%22%40importFrom+magrittr%22&type=Code — Andrew Brēza, Jul 18 '17 at 12:50
Pipes fonction add temp variables to the workspace, that's something I like a lot, but in a function in production it's not a criteria. As I understand pipes can be faster than intermediate values method in some cases because they use lazy evaluation most of the time. — moodymudskipper, Jul 19 '17 at 06:28

Magrittr pipe in R functions

1 Answers1

Conclusion