3

Access result later in pipe

I am trying to create functions which print the number of rows excluded in a dataset at each step in a pipe.

Something like this:

iris %>% 
    function_which_save_nrows_and_return_the_data() %>% 
    filter(exclude some rows) %>% 
    function_which_prints_difference_in_rows_before_after_exlusion_and_returns_data %>% 
    function_which_save_nrows_and_return_the_data() %>% 
    function_which_prints_difference_in_rows_before_after_exlusion_and_returns_data  ...etc

These are the functions I have attempted:

n_before = function(x) {assign("rows", nrow(x), .GlobalEnv); return(x)}

n_excluded = function(x) { 
    print(rows - nrow(x))
    return(x)
}

This successfully saves the object rows:

enter image description here

But if I add two more links, the object is NOT saved:

enter image description here

So how can I create and access the rows-object later the pipe?

enter image description here

Rasmus Larsen
  • 5,721
  • 8
  • 47
  • 79

2 Answers2

6

This is due to R's lazy evaluation. It occurs even if pipes are not used. See code below. In that code the argument to n_excluded is filter(n_before(iris), Species != 'setosa') and at the point that rows is used in the print statement the argument has not been referenced from within n_excluded so the entire argument will not have been evaluated and so rows does not yet exist.

if (exists("rows")) rm(rows)  # ensure rows does not exist
n_excluded(filter(n_before(iris), Species != 'setosa'))
## Error in h(simpleError(msg, call)) : 
##   error in evaluating the argument 'x' in selecting a method for function 
##   'print': object 'rows' not found

To fix this

1) we can force x before the print statement.

n_excluded = function(x) { 
  force(x)
  print(rows - nrow(x))
  return(x)
}

2) Alternately, we can use the magrittr sequential pipe which guarantees that legs are run in order. magrittr makes it available but does not provide an operator for it but we can assign it to an operator like this.

`%s>%` <- magrittr::pipe_eager_lexical
iris %>%
  n_before() %>%
  filter(Species != 'setosa') %s>%  # note use of %s>% on this line
  n_excluded()

The magrittr developer has stated that he will add it as an operator if there is sufficient demand for it so you might want to add such request to magrittr issue #247 on github.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
3

You can also use the extended capabilities of pipeR.

library(dplyr)
library(pipeR)
  
n_excluded = function(x) { 
  print(rows - nrow(x))
  return(x)
}

p <- iris %>>%
   (~rows=nrow(.)) %>>%
   filter(Species != "setosa") %>>%
   n_excluded()
Marco Sandri
  • 23,289
  • 7
  • 54
  • 58