data.table index of subset

Question

Working with data.table package in R, I'm trying to get the 'group number' of some data points. Specifically, my data is trajectories: I have many rows describing a specific observation of the particle I'm tracking, and I want to generate a specific index for the trajectory based on other identifying information I have. If I do a [, , by] command, I can group my data by this identifying information and isolate each trajectory. Is there a way, similar to .I or .N, which gives what I would call the index of the subset?

Here's an example with toy data:

dt <- data.table(x1 = c(rep(1,4), rep(2,4)),
x2 = c(1,1,2,2,1,1,2,2),
z = runif(8))

I need a fast way to get the trajectories (here should be c(1,1,2,2,3,3,4,4) for each observation -- my real data set is moderately large.

akrun · Accepted Answer · 2016-03-12T21:09:57.547

3

If we need the trajectories (donno what that means) based on the 'x2', we can use rleid

dt[, Grp := rleid(x2)]

Or if we need the group numbers based on 'x1' and 'x2', .GRP can be used.

dt[,  Grp := .GRP,.(x1, x2)]

Or this can be done using rleid alone without the by (as @Frank mentioned)

dt[, Grp := rleid(x1,x2)]

edited Mar 12 '16 at 21:09

answered Mar 12 '16 at 16:13

akrun

874,273
37
540
662

1

`.GRP` is probably more intuitive for me but both work – mbarete Mar 13 '16 at 18:30

data.table index of subset

1 Answers1