I'am just learning R and I try to find a way to select rows from a data.frame
I would like to select for every ID
one row for each group "pre"
and post
which has the smalles difference in abs(d_days
), i.e have a similar interval pre-ref, ref-post. The minimum difference should be 1 year.
I am not looking for a maximum or minimum difference, but for the closest/nearest/ most similar difference > 1 year.
my test data.frame looks like:
"ID" "date" "d_days" "group"
"00377698" 2006-11-15 -1006 "pre"
"00377698" 2009-08-16 -1 "pre"
"00377698" 2009-08-17 0 "ref"
"00377698" 2009-08-24 7 "post"
"00377698" 2009-09-03 17 "post"
"00377698" 2009-10-09 53 "post"
"00377698" 2010-02-26 193 "post"
"00377698" 2010-08-27 375 "post"
"00377698" 2010-11-26 466 "post"
"00377698" 2011-08-24 737 "post"
"00540688" 2009-06-26 -1664 "pre"
"00540688" 2009-08-20 -1609 "pre"
"00540688" 2009-11-20 -1517 "pre"
"00540688" 2010-11-17 -1155 "pre"
"00540688" 2011-12-07 -770 "pre"
"00540688" 2014-01-09 -6 "pre"
"00540688" 2014-01-15 0 "ref"
"00540688" 2014-01-20 5 "post"
"00540688" 2014-03-05 49 "post"
"00540688" 2015-04-29 469 "post"
"00540688" 2015-09-30 623 "post"
"00540688" 2016-05-13 849 "post"
my attempts:
I tried somethinkg like data.frame %>% group_by(ID,group) %>% filter (group=="pre"| group=="post" & abs(d_days > 365)) %>% summarise(b = nth(abs(d_days[1]), which.max(abs(d_days[2]))))
I also tried roll=nearest
R - merge dataframes on matching A, B and *closest* C?
I also tried this find value closest to x by group in dplyr, but it did not work out since I am not looking for values close to a specific value but just the "closest" of two sub-groups.
Unfortunately, I could not get what I am looking for:
"ID" "date" "d_days" "group"
"00377698" 2006-11-15 -1006 "pre"
"00377698" 2009-08-17 0 "ref"
"00377698" 2011-08-24 737 "post"
"00540688" 2011-12-07 -770 "pre"
"00540688" 2014-01-15 0 "ref"
"00540688" 2015-09-30 894 "post"
Many thanks for any help!