6

I have a main function which performs a handful of variously complicated (and long-running) computations on some data, it performs these steps using the pipe from tidyverse / magrittr. I would like a progress bar to report on the stage of the processing as it works through it, however, I'm at a loss. I've looked at the cli, progress and progressr packages, and out of them I could only get cli to work (in a manner of speaking.

Here's a minimal example:

library(tidyverse)
library(cli)

main_fun <- function() {
  cli_progress_step(msg = "Running main function")
  tibble(a = 1:5) %>% 
    fun1() %>% 
    fun2() %>% 
    fun3()
}

fun1 <- function(data) {
  cli_progress_step(msg = "Doing sub function 1")
  Sys.sleep(2)

  return(data)
}
fun2 <- function(data) {
  cli_progress_step(msg = "Doing sub function 2")
  Sys.sleep(1)

  return(data)
}
fun3 <- function(data) {
  cli_progress_step(msg = "Doing sub function 3")
  Sys.sleep(3)

  return(data)
}

main_fun()
#> ℹ Running main function
#> ℹ Doing sub function 3
#> ℹ Doing sub function 2
#> ℹ Doing sub function 1
#> ✔ Doing sub function 1 [2s]
#> 
#> ℹ Doing sub function 2✔ Doing sub function 2 [3s]
#> 
#> ℹ Doing sub function 3✔ Doing sub function 3 [6.1s]
#> 
#> ℹ Running main function✔ Running main function [6.1s]
#> # A tibble: 10 × 1
#>        a
#>    <int>
#>  1     1
#>  2     2
#>  3     3
#>  4     4
#>  5     5

This displays the progress bars but in 'reverse' order i.e. 3 then 2 then 1. Once it's all completed all are shown, which is about the only bit I'm happy with.

Moohan
  • 933
  • 1
  • 9
  • 27
  • I got it to work by removing the `%>%` and doing regular `<-` assignments. I don't know why exactly. It could be something with the pipe implementation effecting cli progress bars. I also tried the native pipe `|>` and had the same problem. Pipes in shorter functions aren't as needed since the objects only exist in the function scope. That doesn't really answer the question, but it's a workaround. – Robert Schauner May 05 '23 at 16:48

2 Answers2

5

This is because, in a pipe, functions are not evaluated form left to right. Regular R semantics for evaluation apply - Lazy evaluation or call-by-need. Your call with the base pipe |> will look like:

fun3(fun2(fun1(tibble(a = 1:5))))

You can force the evaluation e.g. with forceAndCall.

data.frame(a = 1:5) |> forceAndCall(n=1, Fun=fun1, data=_) |>
  forceAndCall(n=1, Fun=fun2, data=_) |> forceAndCall(n=1, Fun=fun3, data=_)
#✔ Doing sub function 1 [2s]
#✔ Doing sub function 2 [1s]
#✔ Doing sub function 3 [3s]
#...

Or with magrittr you can use the eager pipe %!>% to evaluate form left to right (Thanks @Moohan for the comment!).

data.frame(a = 1:5) %!>% fun1() %!>%  fun2() %!>% fun3()
#✔ Doing sub function 1 [2s]
#✔ Doing sub function 2 [1s]
#✔ Doing sub function 3 [3s]
#...

You can force the evaluation of a function argument in the first line of the functions, which will result as you might have expected. This works for both pipes |> and %>%.

library(magrittr)
library(cli)

fun1 <- function(data) {
  force(data) #or simple only data
  cli_progress_step(msg = "Doing sub function 1")
  Sys.sleep(2)
  data
}
fun2 <- function(data) {
  force(data)
  cli_progress_step(msg = "Doing sub function 2")
  Sys.sleep(1)
  data
}
fun3 <- function(data) {
  force(data)
  cli_progress_step(msg = "Doing sub function 3")
  Sys.sleep(3)
  data
}

data.frame(a = 1:5) %>% fun1() %>% fun2() %>% fun3()
#✔ Doing sub function 1 [2s]
#✔ Doing sub function 2 [1s]
#✔ Doing sub function 3 [3s]
#✔ Running main function [6.1s]
#...

data.frame(a = 1:5) |> fun1() |> fun2() |> fun3()
#✔ Doing sub function 1 [2s]
#✔ Doing sub function 2 [1s]
#✔ Doing sub function 3 [3s]
#✔ Running main function [6.1s]
#...

Another way might be to write a custom pipe function.

`:=` <- function(lhs, rhs) eval(substitute(rhs), list(. = lhs))

data.frame(a = 1:5) := fun1(.) := fun2(.) := fun3(.)
#✔ Doing sub function 1 [2s]
#✔ Doing sub function 2 [1s]
#✔ Doing sub function 3 [3s]
#...

Another example showing when entering and exiting the functions.

library(magrittr)
f1 <- \(x) {message("IN 1"); x$b <- 1; message("OUT 1"); x}
f2 <- \(x) {message("IN 2"); x$c <- 2; message("OUT 2"); x}

data.frame(a=0) %>% f1 %>% f2
#IN 2
#IN 1
#OUT 1
#OUT 2
#  a b c
#1 0 1 2

data.frame(a=0) |> f1() |> f2()
#IN 2
#IN 1
#OUT 1
#OUT 2
#  a b c
#1 0 1 2

f2(f1(data.frame(a=0)))
#IN 2
#IN 1
#OUT 1
#OUT 2
#  a b c
#1 0 1 2

data.frame(a=0) %!>% f1 %!>% f2
#IN 1
#OUT 1
#IN 2
#OUT 2
#  a b c
#1 0 1 2

data.frame(a=0) := f1(.) := f2(.)
#IN 1
#OUT 1
#IN 2
#OUT 2
#  a b c
#1 0 1 2

. <- data.frame(a=0)
. <- f1(.)
#IN 1
#OUT 1
. <- f2(.)
#IN 2
#OUT 2
.
#  a b c
#1 0 1 2
GKi
  • 37,245
  • 2
  • 26
  • 48
  • I guess I hadn't thought about the evaluation order as you lay out, but that makes sense... That said is there a trick / workaround to make messages display as we'd expect despite R evaluating in the 'opposite' direction? – Moohan May 15 '23 at 13:09
  • 1
    Maybe a solution for you is to write an own pipe function? – GKi May 15 '23 at 13:13
  • Thanks! I was wondering what any downside of using `force()` might be and I came across the 'eager pipe' from magrittr. Which is basically your solution(s)! https://magrittr.tidyverse.org/reference/pipe-eager.html – Moohan May 15 '23 at 14:58
4

This can be achieved using the 'eager pipe' (%!>%) from {magrittr}

library(tidyverse)
library(cli)
library(magrittr)

main_fun <- function() {
  cli_progress_step(msg = "Running main function")
  tibble(a = 1:5) %!>% 
    fun1() %!>% 
    fun2() %!>% 
    fun3()
}

main_fun()

#> ℹ Running main function
#> ℹ Doing sub function 1
#> ✔ Doing sub function 1 [2s]
#> 
#> ℹ Running main functionℹ Doing sub function 2
#> ✔ Doing sub function 2 [1s]
#> 
#> ℹ Running main functionℹ Doing sub function 3
#> ✔ Doing sub function 3 [3s]
#> 
#> ℹ Running main function✔ Running main function [6.1s]
#> # A tibble: 10 × 1
#>        a
#>    <int>
#>  1     1
#>  2     2
#>  3     3
#>  4     4
#>  5     5
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
Moohan
  • 933
  • 1
  • 9
  • 27