4

Is there a way to pause a series of pipes to store a temporary variable that can be used later on in pipe sequence?

I found this question but I'm not sure that it was doing the same thing I am looking for.

Here's a sample dataframe:

library(dplyr)
set.seed(123)
df <- tibble(Grp = c("Apple","Boy","Cat","Dog","Edgar","Apple","Boy","Cat","Dog","Edgar"),
             a = sample(0:9, 10, replace = T),
             b = sample(0:9, 10, replace = T),
             c = sample(0:9, 10, replace = T),
             d = sample(0:9, 10, replace = T),
             e = sample(0:9, 10, replace = T),
             f = sample(0:9, 10, replace = T),
             g = sample(0:9, 10, replace = T))

I am going to convert df to long format but, after having done so, I will need to apply the number of rows before the gather.

This is what my desired output looks like. In this case, storing the number of rows before the pipe begins would look like:

n <- nrow(df)

df %>% 
  gather(var, value, -Grp) %>% 
  mutate(newval = value * n)
# A tibble: 70 x 4
   Grp   var   value newval
   <chr> <chr> <int>  <int>
 1 Apple a         2     20
 2 Boy   a         7     70
 3 Cat   a         4     40
 4 Dog   a         8     80
 5 Edgar a         9     90
 6 Apple a         0      0
 7 Boy   a         5     50
 8 Cat   a         8     80
 9 Dog   a         5     50
10 Edgar a         4     40
# ... with 60 more rows

In my real world problem, I have a long chain of pipes and it would be a lot easier if I could perform this action within the pipe structure. I would like to do something that looks like this:

df %>% 
  { "n = nrow(.)" } %>% # temporary variable is created here but df is passed on
  gather(var, value, -Grp) %>% 
  mutate(newval = value * n)

I could do something like the following, but it seems really sloppy.

df %>% 
  mutate(n = nrow(.)) %>% 
  gather(var, value, -Grp, -n) %>% 
  mutate(newval = value * mean(n))

Is there a way to do this or perhaps a good workaround?

hmhensen
  • 2,974
  • 3
  • 22
  • 43

2 Answers2

6

You could use a code block for a local variable. This would look like

df %>% 
{ n = nrow(.)
  gather(., var, value, -Grp) %>% 
  mutate(newval = value * n)
}

Notice how we have to pass the . to gather as well here and the pipe continues inside the block. But you could put other parts afterwards

df %>% 
{ n = nrow(.)
  gather(., var, value, -Grp) %>% 
  mutate(newval = value * n)
} %>% 
select(newval)
MrFlick
  • 195,160
  • 17
  • 277
  • 295
2

Here is an option with %>>% (pipe operator) from pipeR

library(pipeR)
library(dplyr)
library(tidyr)
df %>>% 
   (~ n  = nrow(.)) %>% 
    gather(., var, value, -Grp) %>%
    mutate(newval = value * n)
# A tibble: 70 x 4
#   Grp   var   value newval
#   <chr> <chr> <int>  <int>
# 1 Apple a         2     20
# 2 Boy   a         7     70
# 3 Cat   a         4     40
# 4 Dog   a         8     80
# 5 Edgar a         9     90
# 6 Apple a         0      0
# 7 Boy   a         5     50
# 8 Cat   a         8     80
# 9 Dog   a         5     50
#10 Edgar a         4     40
# … with 60 more rows
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Can you explain what's going in and coming out of the second line? I just looked up the `pipeR` pipe and get the basics now, but I don't see why this works. `df` is being `pipeR` piped into the parentheses so the pipe send `df` only to `.` but I'm lost after that. – hmhensen May 09 '19 at 22:26
  • @hmhensen The `.` in `%>%` syntax is the data output from the `lhs` of the `%>%` for processing on the rhs of pipe. The `~` is anonymous function call (similar syntax you can find in tidyverse) to generate an object identifier (`n` ) store it in the environment for later processing. while the `gather` step is again using the `.` data coming from `df` i.e.the data itself and then the mutate is just dplyr syntax to create a new column – akrun May 09 '19 at 22:30
  • I'm still not really sure why the `pipeR` pipe works here while the `magrittr` pipe doesn't. How is `df` making it through to the second pipe? Why is `n` not being passed through? With MrFlick's use of brackets, it makes sense to me. I'm not following that part here though since there's only one thing happening in the parentheses. I guess my lack of programming experience is shining through. – hmhensen May 11 '19 at 00:13
  • @hmhensen `n` is available if you check the last step. – akrun May 11 '19 at 00:14
  • Sorry, I mean why do the objects that comes out of the parentheses include both `df` AND `n`? I see why the `n` is output, but how does `df` get output? Since `df` is acted upon by `nrow` how does it come out of the parentheses intact, too? – hmhensen May 11 '19 at 00:28
  • @hmhensen that is the original output (or say the master output coming from the pipe (`.`)) while the second one i.e. nrow(.) is assigned to a new identifier 'n', which we use it. I think you can also asssign multiple other variables and call it by that identifier – akrun May 11 '19 at 00:30
  • @hmhensen Check this `df %>>% (~ n = nrow(.)) %>>% (~ m = 25) %>% gather(., var, value, -Grp) %>% mutate(newval = value * n, newval2 = m)` – akrun May 11 '19 at 00:32