group-wise min or max

Question

There are so many posts on how to get the group-wise min or max with SQL. But how do you do it in R?

Let's say, you have got the following data frame

ID | t | value
a | 1 | 3
a | 2 | 5
a | 3 | 2
a | 4 | 1
a | 5 | 5
b | 2 | 2
b | 3 | 1
b | 4 | 5

For every ID, I don't want the min t, but the value at the min t.

ID | value
a | 3
b| 2

score 5 · Accepted Answer · edited Apr 15 '14 at 15:46

5

df is your data.frame -

library(data.table)

setDT(df) # convert to data.table in place

df[, value[which.min(t)], by = ID]

Output -

> df[, value[which.min(t)], by = ID]
   ID V1
1:  a  3
2:  b  2

edited Apr 15 '14 at 15:46

eddi

49,088
6
104
155

answered Apr 15 '14 at 15:21

TheComeOnMan

12,535
8
39
54

@eddi - I didn't know about setDT. Very nice. And I used `.SD` in a more general sense in case the user wants multiple columns or something, but for just the one column this makes sense too. – TheComeOnMan Apr 15 '14 at 16:11
understood, it's just that currently using `.SD` in such manner results in slow speeds for large number of groups, due to large `[.data.table` overhead and should be avoided unless necessary – eddi Apr 15 '14 at 16:15
Thank you! How can I use this selection for manipulating the data frame, let's say changing the "value" to NA in those two observations (i.e. where "t" is lowest for each ID)? – moabit21 Jan 03 '16 at 18:26

sgibb · Answer 2 · 2014-04-15T15:18:45.737

3

You are looking for tapply:

df <- read.table(textConnection("
ID | t | value
a | 1 | 3
a | 2 | 5
a | 3 | 2
a | 4 | 1
a | 5 | 5
b | 2 | 2
b | 3 | 1
b | 4 | 5"), header=TRUE, sep="|")

m <- tapply(1:nrow(df), df$ID, function(i) {
  df$value[i[which.min(df$t[i])]]
})
# a  b
#  3  2

edited Apr 15 '14 at 15:18

answered Apr 15 '14 at 15:12

sgibb

25,396
3
68
74

1

Don't you need `which.min(df$t[i])` to restrict the range of `t`s you're minimising over. – Gavin Kelly Apr 15 '14 at 15:20

score 0 · Answer 3 · answered Apr 15 '14 at 17:03

0

Two more solutions (with sgibb's df):

sapply(split(df, df$ID), function(x) x$value[which.min(x$t)])

#a  b  
#3  2 

library(plyr)
ddply(df, .(ID), function(x) x$value[which.min(x$t)])

#  ID V1
#1 a   3
#2 b   2

answered Apr 15 '14 at 17:03

WKA

35
4

group-wise min or max

3 Answers3

Linked

Related