3

Working with data.table package in R, I'm trying to get the 'group number' of some data points. Specifically, my data is trajectories: I have many rows describing a specific observation of the particle I'm tracking, and I want to generate a specific index for the trajectory based on other identifying information I have. If I do a [, , by] command, I can group my data by this identifying information and isolate each trajectory. Is there a way, similar to .I or .N, which gives what I would call the index of the subset?

Here's an example with toy data:

dt <- data.table(x1 = c(rep(1,4), rep(2,4)),
x2 = c(1,1,2,2,1,1,2,2),
z = runif(8))

I need a fast way to get the trajectories (here should be c(1,1,2,2,3,3,4,4) for each observation -- my real data set is moderately large.

mbarete
  • 399
  • 2
  • 17

1 Answers1

3

If we need the trajectories (donno what that means) based on the 'x2', we can use rleid

dt[, Grp := rleid(x2)]

Or if we need the group numbers based on 'x1' and 'x2', .GRP can be used.

dt[,  Grp := .GRP,.(x1, x2)]

Or this can be done using rleid alone without the by (as @Frank mentioned)

dt[, Grp := rleid(x1,x2)]
akrun
  • 874,273
  • 37
  • 540
  • 662