I have a dataset df
:
df <- tibble(
id = sort(rep(letters[1:3], 3)),
visit_id = rep(c(0, 5, 10), 3),
true_visit = c(NA, 3, NA, 0, 5, 10, 1, 7, NA)
)
> df
# A tibble: 9 x 3
id visit_id true_visit
<chr> <dbl> <dbl>
1 a 0 NA
2 a 5 3
3 a 10 NA
4 b 0 0
5 b 5 5
6 b 10 10
7 c 0 1
8 c 5 7
9 c 10 NA
I’m trying to create a new column closest_visit
where I find the true_visit
that is closest to visit_id
within each individual. The result would look like:
# A tibble: 9 x 4
id visit_id true_visit closest_visit
<chr> <dbl> <dbl> <dbl>
1 a 0 NA 3
2 a 5 3 3
3 a 10 NA 3
4 b 0 0 0
5 b 5 5 5
6 b 10 10 10
7 c 0 1 1
8 c 5 7 7
9 c 10 NA 7
To clarify, closest_visit
is 3 for individual a
because it's the only true_visit
. closest_visit
is 1 for the seventh row because 0 (the visit_id
for that row) is closer to 1 than it is to 7 (the true_visit
s for that participant), and so on.
I tried looking here, here, and here. They were a good start but not exactly what I'm looking for. Any ideas?