Consider a data set consisting of a grouping variable (here id
) and an ordered variable (here date
)
(df <- data.frame(
id = rep(1:2,2),
date = 4:1
))
# id date
# 1 1 4
# 2 2 3
# 3 1 2
# 4 2 1
I'm wondering what the easiest way is in data.table
to do the equivalent of this dplyr
code:
library(dplyr)
df %>%
group_by(id) %>%
filter(min_rank(date)==1)
# Source: local data frame [2 x 2]
# Groups: id
#
# id date
# 1 1 2
# 2 2 1
i.e. for each id
get the first according to date
.
Based on a similar stackoverflow question (Create an "index" for each element of a group with data.table), I came up with this
library(data.table)
dt <- data.table(df)
setkey(dt, id, date)
for(k in unique(dt$id)){
dt[id==k, index := 1:.N]
}
dt[index==1,]
But it seems like there should be a one-liner for this. Being unfamiliar with data.table
I thought something like this
dt[,,mult="first", by=id]
should work, but alas! The last bit of code seems like it should group by id
and then take the first (which within id
would be determined by date
since I've set the keys in this way.)
EDIT
Thanks to Ananda Mahto, this one-liner will now be in my data.table
repertoire
dt[,.SD[1], by=id]
# id date
# 1: 1 2
# 2: 2 1