10

Since R-Version 4.1.0 the pipe |> is in the stable version. When passing the lhs into an argument other than the first the Examples of the manual show:

mtcars |> subset(cyl == 4) |> (function(d) lm(mpg ~ disp, data = d))()

or when using \(x)

mtcars |> subset(cyl == 4) |> (\(d) lm(mpg ~ disp, data = d))()

Or use PIPEBIND which currently needs to be activated:

Sys.setenv(`_R_USE_PIPEBIND_` = TRUE) 
mtcars |> subset(cyl == 4) |> . => lm(mpg ~ disp, data = .)

Instead of |> also the Bizarro pipe ->.; could be used like

mtcars |> subset(cyl == 4) ->.; lm(mpg ~ disp, data = .)

As one purpose of pipe notation in R is to allow a nested sequence of calls to be written in a way that may make the sequence of processing steps easier to follow, at least for me, this is also fulfilled by ->.;. Bizarro pipe is not really a pipe but for me it is currently a welcome alternative to |> especially in cases when passing the lhs into an argument other than the first. But when using it I get comments not to use it.

So I want to know if the Bizarro pipe has disadvantages which recommends not to use it?


So far I see that it creates or overwrites . in the environment and keeps this reference which will force a copy on modification. But when calling a function, with data in the arguments, also a reference to this data is created. And when using a for loop var stays after usage.

for(i in iris) {}
tracemem(i) == tracemem(iris[[ncol(iris)]])
#[1] TRUE

Also for performance it shows not much disadvantages:

x <- 42
library(magrittr)
Sys.setenv(`_R_USE_PIPEBIND_` = TRUE) 
#Nonsense operation to test Performance
bench::mark(x
, identity(x)
, "x |> identity()" = x |> identity()
, "x |> (\\(y) identity(y))()" = x |> (\(y) identity(y))()
, "x |> . => identity(.)" = x |> . => identity(.)
, "x ->.; identity(.)" = {x ->.; identity(.)}
, x %>% identity
)
#  expression                     min   median `itr/sec` mem_alloc `gc/sec` n_itr
#  <bch:expr>                <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int>
#1 x                          60.07ns  69.03ns 13997474.        0B      0   10000
#2 identity(x)               486.96ns 541.91ns  1751206.        0B    175.   9999
#3 x |> identity()           481.03ns 528.06ns  1812935.        0B      0   10000
#4 x |> (\(y) identity(y))() 982.08ns   1.08µs   854349.        0B     85.4  9999
#5 x |> . => identity(.)     484.06ns 528.06ns  1815336.        0B      0   10000
#6 x ->.; identity(.)        711.07ns 767.99ns  1238658.        0B    124.   9999
#7 x %>% identity              2.86µs   3.23µs   294945.        0B     59.0  9998
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
GKi
  • 37,245
  • 2
  • 26
  • 48
  • This question seems singularly focussed on performance. As I explain in my answer, performance isn’t the issue. And I’m somewhat puzzled that you would immediately think of that. – Konrad Rudolph Jun 07 '21 at 09:10
  • 1
    One can also write: `mtcars |> subset(cyl == 4) |> lm(formula = mpg ~ disp)` . I think the reason performance is the focus is that it seems to have been the main motivating factor behind |> and it gave up most of the features of %>% and the bizarro pipe in order to implement it via syntax transform. I doubt that the side effect argument regarding bizarro pipe is really of practical importance. – G. Grothendieck Jun 07 '21 at 09:44
  • @G.Grothendieck Yes the usage of a placeholder could be avoided when all arguments before it are filled up. But sometimes this will end up in much writing. – GKi Jun 07 '21 at 09:49
  • @G.Grothendieck It is my hope to convince people that this *is* of practical importance: intentionally writing less robust code than would be possible is just an *incredibly* bad idea: remember the software engineering adage that [“one in a million is next Tuesday”](https://learn.microsoft.com/en-us/archive/blogs/larryosterman/one-in-a-million-is-next-tuesday): unlikely-sounding bugs cause issues *all the time* because of scale. And the chance of hidden bugs that silently produce the wrong result should always be minimise. – Konrad Rudolph Jun 07 '21 at 13:40

1 Answers1

10

The main issue with the bizarro pipe is that it creates hidden side-effects and makes it easier to create subtle bugs. It decreases code maintainability.

The persistent existence of the . variable makes it all too easy to accidentally refer to this value later down the line: its presence masks mistakes if you at some point forget to assign to it and think you did. It’s easy to dismiss this possibility but such errors are fairly common and, worse, very non-obvious: you won’t get an error message, you’ll just get a wrong result. By contrast, if you forget the pipe symbol somewhere, you’ll get an immediate error message.

Worse, the bizarro pipe hides this error-prone side-effect in two different ways. First, because it makes the assignment non-obvious. I’ve argued previously that -> assignment shouldn’t be used since left-to-right assignment hides a side-effect, and side-effects should be made syntactically obvious. The side-effect in this case is the assignment, and it should happen where it’s most prominent: in the first column of the expression, not hidden away at its end. This is a fundamental objection to the use of -> (or any other attempt to mask side-effects), not limited to the bizarro pipe.

And because . is by default hidden (from ls and from the inspector pane in IDEs), this makes it even easier to accidentally depend on it.

Therefore, if you want to assign to a temporary name instead of using a pipe, just do that. But:

  1. Perform right-to-left assignment, i.e. use name = value or name <- value, not value -> name.
  2. Use a descriptive name.

I can’t stress enough that this is an actual source of subtle bugs — don’t underestimate it!

Another issue is that its use breaks editor support for auto-formatting code. This is a “solvable issue” in some IDEs via plugins but the solution, as it were, solves an issue that should not even exist. To clarify what I mean, if you’re using the bizarro pipe you’d presumably want a hanging indent, i.e. something along these lines:

mtcars ->.
  subset(cyl == 4) ->.
  lm(mpg ~ disp, data = .)

… but auto-indentation won’t indent the code like this, and auto-formatters will flatten the hanging indent.

Neither of these issues are prohibitive (though the first is quite serious); but in the absence of a positive argument for using the bizarro pipe they tip the balance decisively. After all, what problem does the bizarro pipe solve that isn’t better solved by a proper pipeline operator1 or by regular assignment? If you can’t use R 4.1, use ‘magrittr’. If you don’t like the semantics of ‘magrittr’, write your own pipe operator, use one of the many other existing implementations, or just use regular assignment.

Lastly, one might argue that this code is sufficiently unusual to trip up readers, but honestly I don’t think that’s a very compelling argument if the usage is consistent and clearly documented somewhere. But it presents another argument against recommending its use to beginners.


1 Of course that’s easy to answer: |> does not allow explicit dot substitution. And while I understand the arguments against supporting it, the fact that its absence encourages hacks such as the bizarro pipe is a very strong argument that this was in fact a huge mistake.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Thanks for your in-depth answer! I recently found out that there exists a pipebind, which needs to be activated, which allows substitution. I have included this way in the question, and helps me not to use ->.;. – GKi Jun 08 '21 at 07:41
  • @GKi I couldn’t find anything in the mailing list but I think the reason that pipebind requires activation via an environment variable means that the feature is experimental and might be removed in future versions. Approximately nobody on the mailing lists likes the pipebind feature, and it’s a really odd beast. – Konrad Rudolph Jun 08 '21 at 07:46
  • Yes for sure, at the current stage it looks like that it's better not to use `|> =>` as it might change. But at least it means they are working on it and there will be a possibility to have pipes with substitution in base. – GKi Jun 08 '21 at 07:57