Sorry if this is a duplicate but could not quite find what was right for me.
I have a Dataframe 1 - where Episode_ID is unique but patient_ID is not
Episode_ID Patient_ID Drug Commenced_date
1 1 0 A 2018-01-02
2 2 0 B 2020-11-26
3 3 1 B 2017-09-26
4 4 2 B 2020-11-26
and Dataframe 2 where Visit_ID is unique but Patient_ID is not
Patient_ID Visit_ID Visit_date Visit_Score
1 0 1 2018-01-22 2.3
2 0 2 2019-06-01 1.5
3 0 3 2020-11-26 1.6
4 1 B 2020-11-26 1.4
What I want to achieve is for each episode (from df1) to be matched to that patients closest visit in (df2) by looping through all the visit_dates (matched by patient_ID) in df 2 and have the associated visit_score appended. As each row in Df1 is iterated, the patient ID can be matched to patient ID in df2. So the first row, should be a 1 to 3 match. datediff with which.min can then be applied to find the closest match. Its the coding syntax that I really can't get down right.
For example, based on the above. The result would be:
Episode_ID Patient_ID Drug Commenced_date Closest_Visit Visit_score
1 1 0 A 2018-01-02 2018-01-22 2.3
2 2 0 B 2020-11-26 2020-11-26 1.6
3 3 1 B 2017-09-26 2018-01-22 2.3
4 4 2 B 2020-11-26 2020-11-26 1.4
This should be quite simple but I've becoming really rusty in R and out of practice. I tried to use a nested for loop but it kept breaking. Then tried using sapply but couldn't quite get it to work.
Any help would be greatly appreciated. Happy for a solution in python as well but prefer R for now given the rest of the project at the moment.