finding the index of a max value in R

Question

I have the following data frame called surge:

MeshID    StormID Rate Surge Wind
1         1412 1.0000E-01   0.01 0.0
2         1412 1.0000E-01   0.03 0.0
3         1412 1.0000E-01   0.09 0.0
4         1412 1.0000E-01   0.12 0.0
5         1412 1.0000E-01   0.02 0.0
6         1412 1.0000E-01   0.02 0.0
7         1412 1.0000E-01   0.07 0.0
1         1413 1.0000E-01   0.06 0.0
2         1413 1.0000E-01   0.02 0.0
3         1413 1.0000E-01   0.05 0.0

I used the following code to find the max value of surge per storm:

MaxSurge <- data.frame(tapply(surge[,4], surge[,2], max))

It returns:

1412 0.12
1413 0.06

This is great, except I'd also like it to include the MeshID value at the point where the surge is the maximum. I know I can probably use which.max, but I can't quite figure out how to put this in action. I'm VERY new to R programming.

+1 for a well-posed question. It has everything, 1) data, 2) what you tried, 3) how it didn't quite meet your needs. — Joshua Ulrich, Oct 03 '12 at 11:25
`MaxSurge[which.max(MaxSurge[,4]),1]` is the cheap and dirty way. — Carl Witthoft, Oct 03 '12 at 17:25

score 14 · Answer 1 · answered Oct 04 '12 at 03:31

14

And a data.table solution for coding elegance

library(data.table)
surge <- as.data.table(surge)
surge[, .SD[which.max(surge)], by = StormID]

answered Oct 04 '12 at 03:31

mnel

113,303
27
265
254

score 13 · Answer 2 · edited Oct 18 '12 at 12:55

13

here is another data.table solution, but not relying on .SD (thus 10x faster)

surge[,grp.ranks:=rank(-1*surge,ties.method='min'),by=StormID]
surge[grp.ranks==1,]

edited Oct 18 '12 at 12:55

Andro Selva

53,910
52
193
240

answered Oct 18 '12 at 12:48

massyah

165
1
6

3

+1 Very nice! When `.I` is added, it'll be easier (and even faster I hope): `surge[ surge[,.I[which.max(surge)],by=StormID,drop=TRUE]]`. That's a bit ugly though so we could auto optimize the `.SD` approach to do that under the hood, to retain the elegance of mnel's answer. So just to note that it is true as you rightly say that `.SD` should be avoided if possible, currently, because it creates the entire subset which might not be needed. But this will hopefully not be true in future. One of the reasons it's all inside `[...]` is so `data.table` can optimize things like this in future. – Matt Dowle Oct 18 '12 at 13:14

score 7 · Accepted Answer · answered Oct 03 '12 at 13:08

7

If you have 2 data.points at the maximum, which.max will only refer to the first one. A more complete solution would involve rank:

# data with a tie for max  
surge <- data.frame(MeshID=c(1:7,1:4),StormID=c(rep(1412,7),
rep(1413,4)),Surge=c(0.01,0.03,0.09,0.12,0.02,0.02,0.07,0.06,0.02,0.05,0.06))

# compute ranks  
surge$rank <- ave(-surge$Surge,surge$StormID,FUN=function(x) rank(x,ties.method="min"))
# subset on the rank  
subset(surge,rank==1)
   MeshID StormID Surge rank
4       4    1412  0.12    1
8       1    1413  0.06    1
11      4    1413  0.06    1

answered Oct 03 '12 at 13:08

James

65,548
14
155
193

This worked well - I was concerned about multiple maximum value occurances. What if I am only concerned about cases where surge >.10? – kimmyjo221 Oct 04 '12 at 15:40
@user1716877 `subset(surge,Surge>0.1)` – James Oct 04 '12 at 15:52

Joshua Ulrich · Answer 4 · 2012-10-03T12:23:48.907

6

Here's a plyr solution, just because someone will say it if I don't...

R> ddply(surge, "StormID", function(x) x[which.max(x$Surge),])
  MeshID StormID Rate Surge Wind
1      4    1412  0.1  0.12    0
2      1    1413  0.1  0.06    0

edited Oct 03 '12 at 12:23

answered Oct 03 '12 at 11:23

Joshua Ulrich

173,410
32
338
418

The two methods seem to have given different results. The `ddply` version works, because inside the function you are indexing a subset of `x`. In the `tapply` version `which.max` returns the index of the maximum in the subset but uses it to index the whole of `x`. – seancarmody Oct 03 '12 at 11:53
Can I ask a further question? If I wanted to count the number of times the max is repeated for a particular stormID, how would I do that? At this point it is just picking the first instance of MeshID for which Surge is a max, correct? What if the max occurs more than once? Thank you. – kimmyjo221 Oct 04 '12 at 15:03
Perfect! Sorry one more question. What if I'm really only interested in those cases where surge > .10? – kimmyjo221 Oct 04 '12 at 15:34

finding the index of a max value in R

4 Answers4

Linked

Related