2

I have a dataframe df with columns ID, Year, Value1, Value2, Value3 and 21788928 rows. I need to subset the data by Year and IDand find the max Value1 in that subset saving the rest of information of that row, I need to do that for all the combinations of Year and ID(Year goes from 1982 to 2013, ID is from 1 to 28371)

I was trying to do that in a double for loop:

year<-seq(1982, 2013)
cnt=1
for (i in 1:32) {
  for (j in 1:28371)

    A<-df[df$Year==year[i]&df$ID==j,] 
    maxVal[cnt,]<-A[A$Value1==max(A$Value1),]
    cnt=cnt+1
}
}

but it takes way to long. Is there a more efficient way to do that? Maybe using ddply or with.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Liza
  • 1,066
  • 2
  • 16
  • 26

2 Answers2

5

A base R solution with aggregate:

prov <- aggregate(. ~ Year + ID, data = dat, FUN = max)
SabDeM
  • 7,050
  • 2
  • 25
  • 38
  • 1
    @Liza you are welcome. Consider to accept the answer that fits your goals and next time if you have to add clarifications please do edit your post and do not post an answer (unless of course you find by yourself the solution for your problems ) – SabDeM Aug 01 '15 at 16:44
4

You can use dplyr

library(dplyr)
dat %>% group_by(ID, Year) %>%
  summarise(mval=max(Value1)) -> result

or plyr, keeping all the other columns (and repeating max Value1 as mval)

ddply(dat, .(ID, Year), function(x) {
    transform(x[which.max(x$Value1),], mval=Value1)
  }, .drop=F)

Data

dat <- data.frame(ID=sample(1:10, 100, rep=T),
                  Year=sample(1995:2000, 100, rep=T),
                  Value1=runif(100))
Rorschach
  • 31,301
  • 5
  • 78
  • 129