Value of first and last non missing time point by condition of other variables. conditional coalesce (dplyr)

Question

a/b/c is different variables, t1 is time point 1, t2 is time point 2, t3 is time point 3.

The purpose is to create a two new columns: one with the first and one with the last non missing value for each row of a_t1 to a_t3. On the condition that it is also non missing in the variables b and c (at the same time point).

I think coalesce() could work with some kind of conditional formatting. However, my knowledge of this is limited. Preferably tidyverse solution, but other solutions works as well.

library(tidyverse)

df<-tibble::tribble(
                       ~a_t1, ~a_t2, ~a_t3, ~b_t1, ~b_t2, ~b_t3, ~c_t1, ~c_t2, ~c_t3,
                           1,    NA,     9,     2,    NA,     6,     3,    NA,     7,
                           2,    NA,     8,    NA,     5,     8,    NA,     1,     8,
                          NA,    NA,     3,     2,    NA,     9,     2,    NA,    22,
                          NA,     5,     9,     4,    NA,     9,     4,     5,    NA,
                          NA,     9,    10,    NA,     6,    11,    NA,     6,    NA
                       )

Expected output for the first value of a: 1, 8, 3, NA, 9

Expected output for the last value of a: 9, 8, 3, NA, 9

(in the real dataset there is more timepoints and variables to consider)

Can you show your expected output for given data? – Ronak Shah Jun 16 '20 at 07:35 — Ronak Shah, Jun 16 '20 at 07:35

score 2 · Accepted Answer · answered Jun 16 '20 at 08:27

A way using dplyr and tidyr :

library(dplyr)
library(tidyr)

df %>%
  #Create a row number
  mutate(row = row_number()) %>%
  #Get data in long format
  pivot_longer(cols = -row) %>%
  #Separate the data in two columns
  separate(name, c('name1', 'name2'), sep = '_') %>%
  #Group by each row and t1, t2 columns
  group_by(row, name2) %>%
  #Drop groups with all `NA` values
  filter(all(!is.na(value))) %>%
  #For each row get first and last value for "a" columns
  group_by(row) %>%
  summarise(first = first(value[name1 == 'a']), 
            last = last(value[name1 == 'a'])) %>%
  #Complete the data for missing rows.
  complete(row = 1:nrow(df))

#    row first last
#  <int> <dbl> <dbl>
#1     1     1     9
#2     2     8     8
#3     3     3     3
#4     4    NA    NA
#5     5     9     9

Thanx for the great solution. I had an afterthought: is it possible to also get the position of the value that is used. In this case if it was in the first, second or third column of "a" (or the name of the column). — Droc, Jun 16 '20 at 09:28
Yes, that is possible but perhaps you should ask it as a new question. — Ronak Shah, Jun 16 '20 at 10:53

Value of first and last non missing time point by condition of other variables. conditional coalesce (dplyr)

1 Answers1