2

My data consists of values (x,y,z) plus identifier (id).

d <- data.frame( x=c( 11, 12, 13, 14, 15, 16, 17, 18, 19),
                 y=c( 21, 22, 23, 24, 25, 26, 27, 28, 29),
                 z=c(101,212,307,304,206,102,230,145,315),
                 id=c( 1,  2,  3,  3,  2,  1,  2,  1,  3)
                )
d
   x  y   z id
1 11 21 101  1
2 12 22 212  2
3 13 23 307  3
4 14 24 304  3
5 15 25 206  2
6 16 26 102  1
7 17 27 230  2
8 18 28 145  1
9 19 29 315  3

I need a subset of this data where id is unique and z is minimum within equal id
(e.g.: 101 is the smalest value for z, where id == 1)

   x  y   z id  
1 11 21 101  1  
4 14 24 304  3  
5 15 25 206  2  

I've found a solution using unique() to create a vector of unique identifiers, then using a for() loop, extracting subsets by id and searching for minimal z-values. But since the amount of data is huge this is'nt fast enough.

Any smarter ideas?

1 Answers1

2

data.table is really fast when it comes to such operations. For your question, this seems to work:

library(data.table)
#.SD reflects the group and which.min finds the index of the minimum
setDT(d)[, .SD[which.min(z)], by = 'id']
#   id  x  y   z
#1:  1 11 21 101
#2:  2 15 25 206
#3:  3 14 24 304
LyzandeR
  • 37,047
  • 12
  • 77
  • 87