How to use dplyr's coalesce function with group_by() to create one row per person with all values filled in?

Question

I am trying to use coalesce() to produce one row per participant that has their name and their score. Participants had 3 opportunities to fill in their data, and most only came in once (and those that came in multiple times always put in the same data). So my data looks like:

library(dplyr)

test_dataset <- tibble(name = c("justin", "justin", "justin", "corey", "corey", "corey", "sib", "sib", "sib", "kate", "kate", "kate"),
                       score1 = c(NA_real_, NA_real_, 1, 2, NA_real_, NA_real_, 2, NA_real_, 2, NA_real_, NA_real_ , NA_real_),
                       score2 = c(NA_real_, 7, NA_real_, 5, NA_real_, NA_real_, 9, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_))

And I want it to look like:

library(dplyr)

answer <- tibble(name = c("justin", "corey", "sib", "kate"),
                       score1_true = c(1, 2, 2, NA),
                       score2_true = c(7, 5, 9, NA))

I've tried the below solution, which does give me the 'true' score, but it's spread out over 12 rows (3 rows per person) instead of 4 (one per person):

library(dplyr)

test_dataset %>%
  dplyr::group_by(name) %>%
  mutate(across(c(starts_with("score")), .fns = list(true = ~coalesce(.))))

corey has two values for score 1. How did you decide which one should be in the final result? — Allan Cameron, May 31 '22 at 20:41
`coalesce()` finds the first non-missing value at each position. But it doesn't exclude or filter the non-first values. — M.Viking, May 31 '22 at 20:42
@AllanCameron, he should only have one non-NA value. Fixed, thank you! — J.Sabree, May 31 '22 at 20:45

langtang · Answer 1 · 2022-05-31T21:21:19.850

4

You can use fill(), and then arrange() the scores and use slice_head():

test_dataset %>% 
  group_by(name) %>%
  fill(score1, score2) %>%
  arrange(score1, score2) %>%
  slice_head(n=1)

Output:

  name   score1_true score2_true
  <chr>        <dbl>       <dbl>
1 justin           1           7
2 corey            2           5
3 sib              2           9
4 kate            NA          NA

more concise/improved version thanks to @M.Viking:

use the .direction="up" option within fill()

test_dataset %>% 
  group_by(name) %>%
  fill(score1, score2, .direction="up") %>%
  slice_head(n=1)

edited May 31 '22 at 21:21

answered May 31 '22 at 20:45

langtang

22,248
1
12
27

1

`fill()` has fill direction options (`.direction = c("down", "up", "downup", "updown")`), which would let you avoid the arrange step: `fill(score1, score2, .direction="up") %>%` – M.Viking May 31 '22 at 20:55
1

thanks @M.Viking.. that's helpful. I wasn't sure about the OP's values for `name="Corey"`, but you are right in this case that using `.direction="up"` would avoid the `arrange()` – langtang May 31 '22 at 21:10
1

Fill is the way to go in this case, and if the data is dirty (multiple differing results per participant), we might need to take the order sequence into account (for example; to get the final non-NA value per participant: fill down and slice_tail) – M.Viking May 31 '22 at 21:17

akrun · Accepted Answer · 2022-05-31T20:46:19.360

We could reorder the values based on the NA elements and then slice the first row

library(dplyr)
test_dataset %>% 
  group_by(name) %>%
  dplyr::mutate(across(starts_with('score'), 
   ~ .x[order(is.na(.x))])) %>% 
  slice_head(n = 1) %>% 
  ungroup

-output

# A tibble: 4 × 3
  name   score1 score2
  <chr>   <dbl>  <dbl>
1 corey       2      5
2 justin      1      7
3 kate       NA     NA
4 sib         2      9

Or another option is to use complete.cases after rearranging

test_dataset %>% 
  group_by(name) %>%
  dplyr::mutate(across(starts_with('score'), 
   ~ .x[order(is.na(.x))])) %>% 
  filter(complete.cases(across(starts_with('score')))|row_number() == 1) %>%
   ungroup

-output

# A tibble: 4 × 3
  name   score1 score2
  <chr>   <dbl>  <dbl>
1 justin      1      7
2 corey       2      5
3 sib         2      9
4 kate       NA     NA

How to use dplyr's coalesce function with group_by() to create one row per person with all values filled in?

2 Answers2

more concise/improved version thanks to @M.Viking: