4

I've got a dataframe like this one:

stage1 stage2 stage3 stage4
a        NA     b      c
NA       d      NA     e
NA       NA     f      g
NA       NA     NA     h

Where each column is a stage from a process. What I want to do is to coalesce each column based on the previous columns:

stage1 stage2 stage3 stage4 
a        a      a      a
NA       d      d      d
NA       NA     f      f
NA       NA     NA     h

The actual values don't really matter, this could also be a logical dataframe, where each string from the output is TRUE and each NA is FALSE .

I've written this function that lets me coalesce through a selection of columns:

coacross <- function(...) {
  coalesce(!!!across(...))
}

df <- df %>%
  mutate(total_stages = coacross(everything()))

Which basically creates stage4 column of my desired output. Is there any way to iteratively run this, ideally without a for loop? So I can do the same for stage2 and stage3? Else, is there another way to do this?

Thanks a lot.

Edit:

This works:

for(col in names(df %>% select(-stage1))){
  print(col)
  df = df %>%
    mutate({{col}} := coacross(stage1:{{col}}))
  
}

But any more elegant solutions are greatly appreciated

Juan C
  • 5,846
  • 2
  • 17
  • 51
  • 1
    Probably not as elegant as you want, but you could do `df %>% mutate(row = row_number()) %>% pivot_longer(-row) %>% group_by(row) %>% fill(value) %>% pivot_wider(names_from = name, values_from = value)`. Here's a prior question using this approach with an earlier tidyr syntax: https://stackoverflow.com/a/54601554/6851825 – Jon Spring Apr 12 '23 at 21:15

2 Answers2

7

You could aslo use accumulate:

library(tidyverse)
as_tibble(accumulate(df, coalesce))

# A tibble: 4 × 4
  stage1 stage2 stage3 stage4
  <chr>  <chr>  <chr>  <chr> 
1 a      a      a      a     
2 NA     d      d      d     
3 NA     NA     f      f     
4 NA     NA     NA     h  
Onyambu
  • 67,392
  • 3
  • 24
  • 53
2

You can use across() with an assist from cur_column():

library(dplyr)

df %>%
  mutate(across(everything(), \(x) coacross(stage1:cur_column())))
  stage1 stage2 stage3 stage4
1      a      a      a      a
2   <NA>      d      d      d
3   <NA>   <NA>      f      f
4   <NA>   <NA>   <NA>      h
zephryl
  • 14,633
  • 3
  • 11
  • 30
  • across inside across, that's genius! is the `\(x)` some new syntax? it's giving me an error using `tidyverse` 1.3.1 but replaced it with `~` and it worked – Juan C Apr 12 '23 at 21:44
  • It’s not *that* new :), just as of R version 4.1. It’s base R shorthand for an anonymous function. `\(x) sum(x)` is equivalent to `function(x) sum(x)` or `~ sum(.x)`. – zephryl Apr 12 '23 at 21:47
  • Here I'm stuck with `3.6.0` until who knows when, heh. Thanks for the introduction to that, when the future arrives to my workplace (: – Juan C Apr 12 '23 at 21:52