Should I avoid programming packages with pipe operators?

Question

Are there any objective reasons for why pipe operators from the R package magrittr, such as %>%, should be avoided when I program packages in R?

More specifically, I want to know if using pipe operators might cause coding conflicts or (positively or negatively) affect performance. I am looking for specific, concrete examples of such cases.

opinion-based (and flamebait, unintentional or not) ... Opinions differ widely. I like pipes in some contexts but personally feel that bending over backward to do *everything* with pipes gets silly sometimes, e.g. http://stackoverflow.com/questions/27053935/pipe-in-magrittr-package-is-not-working-for-function-load . — Ben Bolker, Aug 10 '16 at 17:55
You should avoid it if it doesn't fit your programming style and is not useful to you. If you like piping use it. — Roland, Aug 10 '16 at 18:06
If you really want to ask this question, you could probably rewrite it focused on the second half of your question, e.g. "are there potential errors or conflicts that can be occur in pipe-based programming using `magrittr`, analogous to the ones that can occur when using NSE? I am looking for specific, concrete examples rather than general opinions about the usefulness or wisdom of using pipes" — Ben Bolker, Aug 10 '16 at 18:33
I edited the question, thank you. It was never intended to ask for opinions on style. — Johan Larsson, Aug 10 '16 at 18:37
The pipe is slightly slower than nothing at all, as you're calling a function instead of...not. In most cases, that cost is very small, though, and is generally offset by the significantly less time you (or others) will spend reading your code. Ultimately, there are great R coders who use it, and great R coders who don't. — alistaire, Aug 10 '16 at 20:33

eddi · Accepted Answer · 2016-08-10T19:50:00.253

44

Like all advanced functions written in R, %>% carries a lot of overhead, so don't use it in loops (this includes implicit loops, such as the *apply family, or the per group loops in packages like dplyr or data.table). Here's an example:

library(magrittr)
x = 1:10

system.time({for(i in 1:1e5) identity(x)})
#   user  system elapsed 
#   0.07    0.00    0.08 
system.time({for(i in 1:1e5) x %>% identity})
#   user  system elapsed 
#  15.39    0.00   16.68

edited Aug 10 '16 at 19:50

answered Aug 10 '16 at 19:44

eddi

49,088
6
104
155

I didn't realize the pipe operator has so much overhead until seeing this post! – Haizi Aug 27 '20 at 17:51
4

As of magrittr 2.0.0, the performance of the pipe operator is much improved. I get 0.036 and 0.291 seconds respectively for the two benchmarks above now. – Johan Larsson Dec 07 '20 at 14:50
4

And now the native pipe operator `|>` added in R version 4.1.0 has no overhead. – Mikko Marttila Mar 09 '22 at 16:05

Gregor Thomas · Answer 2 · 2022-05-09T16:11:02.557

Adding dependencies to a package shouldn't be taken too lightly. Speaking generally, every package that your package depends on is a risk for future maintenance whenever the dependency updates, or in case the dependency stops being maintained. It also makes it (slightly) harder for people to install your package - though only noticeably so in cases where an internet connection is unreliable or in some cases where some packages are more difficult to install on certain systems or hardware. But if someone wants to put your package on a thumb drive to install somewhere, they will also need to make sure they have all of your dependencies (and the dependencies of your dependencies...).

Base R and the default packages have a long history, and R-Core is very conscious of not introducing changes that will break downstream dependencies. magrittr is much newer, looks like it was first up on CRAN in Feb 2014.

Practically speaking, magrittr has been stable and seems like a low risk dependency. Especially if you are importing just %>% and ignoring the more esoteric operators it provides (as is done by dplyr, tidyr, et al.) you are probably quite safe. Its popularity almost guarantees that even if its creator abandons it, someone will take over the maintenance.

Now in 2022 we've had a couple R releases featuring the base pipe |>, so there's a nice alternative with 0 dependencies as long as you can run R version 4.1.0 or greater.

Similarly, From a teaching perspective, more dependencies causes more problems with platform specific loading issues (Mac vs PC vs Linux, current vs older OS, different chips, etc), questions about warnings and error messages , etc. I teach vanilla R, and look at the dependencies of packages I think about using — N Brouwer, May 07 '22 at 17:36

IRTFM · Answer 3 · 2016-08-13T18:08:19.947

The piping paradigm inverts the apparent order of function application in comparison with "standard functional programming". Whether this has adverse consequences depends on the function semiotics (my original mispledding was intended to be 'semantics' but the spielchucker though I meant semiotics and that seemed OK). I happen to think piping creates code that is less readable, but that is because I have trained my brain to look at coding from the "inside-out". Compare:

 y <- func3 ( func2( func1( x) ) )

 y <- x %>% func1 %>% func2 %>% func3

To my way of thinking the first one is more readable since the information "flows" outward (and consistently leftward) and ends up in the leftmost position y, where as the information in the second one flows to the right and then "turns around and is sent to the left. The piping paradigm also allow argument-less function application which I think increases the potential for error. R programming with only positional parameter matching often produces error messages that are totally inscrutable, whereas disciplining yourself to always (or almost always) use argument names has the benefit of much more informative error messages.

My preference would have been for a piping paradigm that had a consistent direction:

 y <- func3 %<% func2 %<% func1 %<% x
 # Or
 x %>% func1 %>% func2 %>% func3 -> y

And I think this was actually part of the original design of pkg-magrittr which I believe included a 'left-pipe' as well as a 'right-pipe'. So this is probably a human-factors design issue. R has left to right associativity and the typical user of the dplyr/magrittr piping paradigm generally obeys that rule. I probably have stiff-brain syndrome, and all you young guys are probably the future, so you make your choice. I really do admire Hadley's goal of rationalizing data input and processing so that files and SQL servers are seen as generalized serial devices.

The example offered by David Robinson suggests that keeping track of arguments is a big issues and I agree completely. My usual approach is using tabs and spaces to highlight the hierarchy:

func3 ( func2( 
           func1(x, a),    # think we need an extra comma here
               b, c),       # and here
        d, e, f) 

x %>% func1(a) %>% func2(b, c) %>% func3(d, e, f)

Admittedly this is made easier with a syntax-aware editor when checking for missing commas or parentheses, but in the example above which was not done with one, the stacking/spacing method does highlight what I think was a syntax error. (I also quickly add argument names when having difficulties, but I think that would be equally applicable to piping code tactics.)

Some people do use `->` with pipes, and it drives me crazy! To each their own, I suppose. — Gregor Thomas, Aug 10 '16 at 23:53
I was aware that `->` was parseable R. I'm just saying that if you use a rightward data flow that a right-assign would seem more consistent. — IRTFM, Aug 10 '16 at 23:55
"stiff-brain syndrome": I'm with you. The part of my brain I use for programming just doesn't work in a way where piping produces more readable code for me. I could probably retrain it, but don't see the advantage. Luckily, most of R-core seems to be in the same situation. — Roland, Aug 11 '16 at 06:09
I think the listed example does stack the deck a little in having each function take only one argument. Which is clearer: `func3 ( func2( func1(x, a) b, c) d, e, f)` or `x %>% func1(a) %>% func2(b, c) %>% func3(d, e, f)` in terms of which argument goes with which function? — David Robinson, Aug 12 '16 at 12:34
In addition to the remark made by @DavidRobinson, piping is much more forgiving when it comes to changing code after it's written or rearranging the order of functions. — Johan Larsson, Aug 13 '16 at 17:18
The package `backpipe` provides `%<%`, the vignette shows how it plays neatly with `shiny` html creation functions. https://cran.r-project.org/web/packages/backpipe/index.html — moodymudskipper, Nov 30 '17 at 10:05
@DavidRobinson (and others) The LisP language development environments generally offer a syntax aware "pretty-printing" function that will display composite function calls using extra linefeeds and indentations as I showed. I suspect someone has done the same for R. Perhaps such a function has been requested and satisfied in the Rhelp Archives? — IRTFM, Nov 30 '17 at 19:23

Should I avoid programming packages with pipe operators?

3 Answers3

Linked