2

I am trying to use coalesce() to produce one row per participant that has their name and their score. Participants had 3 opportunities to fill in their data, and most only came in once (and those that came in multiple times always put in the same data). So my data looks like:

library(dplyr)

test_dataset <- tibble(name = c("justin", "justin", "justin", "corey", "corey", "corey", "sib", "sib", "sib", "kate", "kate", "kate"),
                       score1 = c(NA_real_, NA_real_, 1, 2, NA_real_, NA_real_, 2, NA_real_, 2, NA_real_, NA_real_ , NA_real_),
                       score2 = c(NA_real_, 7, NA_real_, 5, NA_real_, NA_real_, 9, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_))

And I want it to look like:

library(dplyr)

answer <- tibble(name = c("justin", "corey", "sib", "kate"),
                       score1_true = c(1, 2, 2, NA),
                       score2_true = c(7, 5, 9, NA))

I've tried the below solution, which does give me the 'true' score, but it's spread out over 12 rows (3 rows per person) instead of 4 (one per person):

library(dplyr)

test_dataset %>%
  dplyr::group_by(name) %>%
  mutate(across(c(starts_with("score")), .fns = list(true = ~coalesce(.))))
J.Sabree
  • 2,280
  • 19
  • 48

2 Answers2

4

You can use fill(), and then arrange() the scores and use slice_head():

test_dataset %>% 
  group_by(name) %>%
  fill(score1, score2) %>%
  arrange(score1, score2) %>%
  slice_head(n=1)

Output:

  name   score1_true score2_true
  <chr>        <dbl>       <dbl>
1 justin           1           7
2 corey            2           5
3 sib              2           9
4 kate            NA          NA

more concise/improved version thanks to @M.Viking:

  • use the .direction="up" option within fill()
test_dataset %>% 
  group_by(name) %>%
  fill(score1, score2, .direction="up") %>%
  slice_head(n=1)
langtang
  • 22,248
  • 1
  • 12
  • 27
  • 1
    `fill()` has fill direction options (`.direction = c("down", "up", "downup", "updown")`), which would let you avoid the arrange step: `fill(score1, score2, .direction="up") %>%` – M.Viking May 31 '22 at 20:55
  • 1
    thanks @M.Viking.. that's helpful. I wasn't sure about the OP's values for `name="Corey"`, but you are right in this case that using `.direction="up"` would avoid the `arrange()` – langtang May 31 '22 at 21:10
  • 1
    Fill is the way to go in this case, and if the data is dirty (multiple differing results per participant), we might need to take the order sequence into account (for example; to get the final non-NA value per participant: fill down and slice_tail) – M.Viking May 31 '22 at 21:17
3

We could reorder the values based on the NA elements and then slice the first row

library(dplyr)
test_dataset %>% 
  group_by(name) %>%
  dplyr::mutate(across(starts_with('score'), 
   ~ .x[order(is.na(.x))])) %>% 
  slice_head(n = 1) %>% 
  ungroup

-output

# A tibble: 4 × 3
  name   score1 score2
  <chr>   <dbl>  <dbl>
1 corey       2      5
2 justin      1      7
3 kate       NA     NA
4 sib         2      9

Or another option is to use complete.cases after rearranging

test_dataset %>% 
  group_by(name) %>%
  dplyr::mutate(across(starts_with('score'), 
   ~ .x[order(is.na(.x))])) %>% 
  filter(complete.cases(across(starts_with('score')))|row_number() == 1) %>%
   ungroup

-output

# A tibble: 4 × 3
  name   score1 score2
  <chr>   <dbl>  <dbl>
1 justin      1      7
2 corey       2      5
3 sib         2      9
4 kate       NA     NA
akrun
  • 874,273
  • 37
  • 540
  • 662