I have a large dataset in R where each subject (Label: 1, 2, 3,...) gets scanned 2 or more times for fat mass, lean mass, etc. at several time points (Comments: PRE F1 BMR, POST F1 BMR,..). Some scans are erratic, so we can't just average all of them. I need a way to automatically select the rows with the two best (closest) measurements for one of the variables (Fat). Here's what the dataset looks like:
Example of dataset with multiple scans for subject 16 in POST F1 BMR
I've been trying to group the data by Label and Comments, but then is there a way to slice out the two rows within those groups where the Fat measurements are closest?
(P.S. Still a struggling R user and first time posting on StackOverflow, so forgive the layout!)
edited: Here's a simple test case with the intended result --
set.seed(2)
df <- data.frame(Fat=sample(1:10, 12, replace=T),
Lean=sample(1:5, 12, replace=T),
Label=rep(1:2, c(5,7)),
Comments=rep(c("PRE BMR", "POST BMR", "PRE BMR", "POST
BMR"), c(2,3,2,5)))
dfresults<-df[-c(4,8,9,12),]