R: Find max value for column among a subset of a data frame

Question

I have a dataframe df with columns ID, Year, Value1, Value2, Value3 and 21788928 rows. I need to subset the data by Year and IDand find the max Value1 in that subset saving the rest of information of that row, I need to do that for all the combinations of Year and ID(Year goes from 1982 to 2013, ID is from 1 to 28371)

I was trying to do that in a double for loop:

year<-seq(1982, 2013)
cnt=1
for (i in 1:32) {
  for (j in 1:28371)

    A<-df[df$Year==year[i]&df$ID==j,] 
    maxVal[cnt,]<-A[A$Value1==max(A$Value1),]
    cnt=cnt+1
}
}

but it takes way to long. Is there a more efficient way to do that? Maybe using ddply or with.

Please post a sample of the data, and the desired result – Rich Scriven Aug 01 '15 at 15:50 — Rich Scriven, Aug 01 '15 at 15:50

score 5 · Answer 1 · answered Aug 01 '15 at 16:24

5

A base R solution with aggregate:

prov <- aggregate(. ~ Year + ID, data = dat, FUN = max)

answered Aug 01 '15 at 16:24

SabDeM

7,050
2
25
38

1

@Liza you are welcome. Consider to accept the answer that fits your goals and next time if you have to add clarifications please do edit your post and do not post an answer (unless of course you find by yourself the solution for your problems ) – SabDeM Aug 01 '15 at 16:44

Rorschach · Accepted Answer · 2015-08-01T16:08:58.573

4

You can use dplyr

library(dplyr)
dat %>% group_by(ID, Year) %>%
  summarise(mval=max(Value1)) -> result

or plyr, keeping all the other columns (and repeating max Value1 as mval)

ddply(dat, .(ID, Year), function(x) {
    transform(x[which.max(x$Value1),], mval=Value1)
  }, .drop=F)

Data

dat <- data.frame(ID=sample(1:10, 100, rep=T),
                  Year=sample(1995:2000, 100, rep=T),
                  Value1=runif(100))

edited Aug 01 '15 at 16:08

answered Aug 01 '15 at 16:00

Rorschach

31,301
5
78
129

Thank you! It did exactly what I wanted i just few min. – Liza Aug 01 '15 at 16:38
2

Or `library(data.table);setDT(dat)[, list(mval= max(Value1)) , .(ID, Year)]` – akrun Aug 01 '15 at 18:46

R: Find max value for column among a subset of a data frame

2 Answers2