I have a large dataframe where multiple rows are repeated measurements for a single ID. I want to return the rows with the maximum value of a column for each individual. Essentially performing a group.by() function as per SQL.
Dataframe (for illustrative purposes)
ID lac pO2 M1 1 80 M1 4 80 M2 2 70 M2 3 70 M3 3 75 M3 5 75
I want to call max(lac) and return the following results.
ID lac pO2 M1 4 80 M2 3 70 M3 5 75
I've had a look around and thought that the by() function might be useful, but haven't had any joy (code below).
newdf <- by(df, df$ID, max(df$lac))
Error in FUN(X[[1L]], ...) : could not find function "FUN"
I also looked at tapply but this doesn't work because I'm using a dataframe rather than a vector.
newdf <- tapply(df, df$ID, max)
Error: "arguments must have same length"
I've looked at similar answers, but these haven't helped. I'd appreciate some input from people more experienced than I!
Edit
Having dug a little deeper I've uncovered this question which suggests the plyr package might be useful.