Indexing based on consecutive field values

Question

I have a data.table in R tracking the movement of items within a system. I'd like to group these data based on two fields, ID and Location.

library(data.table)
example <- data.table(ID = rep(LETTERS[1:3], each = 6),
                      Location = c(1,2,3,1,2,1,2,2,2,3,3,1,2,3,3,3,1,3))
example
#     ID Location
#  1:  A        1
#  2:  A        2
#  3:  A        3
#  4:  A        1
#  5:  A        2
#  6:  A        1
#  7:  B        2
#  8:  B        2
#  9:  B        2
# 10:  B        3
# 11:  B        3
# 12:  B        1
# 13:  C        2
# 14:  C        3
# 15:  C        3
# 16:  C        3
# 17:  C        1
# 18:  C        3

What I would like to have as an output is a new column with a number that increments each time the location changes, regardless of what the new location is (i.e. if that location exists elsewhere in the history). Rather the opposite of this question, which increments only within the group.

expected_output <- data.table(ID = rep(LETTERS[1:3], each = 6),
                     Location = c(1,2,3,1,2,1,2,2,2,3,3,1,2,3,3,3,1,3),
                     Group = c(1,2,3,4,5,6,1,1,1,2,2,3,1,2,2,2,3,4))
expected_output
#     ID Location Group
#  1:  A        1     1
#  2:  A        2     2
#  3:  A        3     3
#  4:  A        1     4
#  5:  A        2     5
#  6:  A        1     6
#  7:  B        2     1
#  8:  B        2     1
#  9:  B        2     1
# 10:  B        3     2
# 11:  B        3     2
# 12:  B        1     3
# 13:  C        2     1
# 14:  C        3     2
# 15:  C        3     2
# 16:  C        3     2
# 17:  C        1     3
# 18:  C        3     4

I've tried a few different combinations of by arguments with little luck so far. The closest I seem to be able to get is something with diff which partially correctly shows when the change occurs, but increments internally.

output <- example
output[, Group := 1:.N, by = paste0(ID, Location, diff(Location))]
output
#     ID Location Group
#  1:  A        1     1 
#  2:  A        2     1 # not incrementing/new group
#  3:  A        3     1 # not incrementing/new group
#  4:  A        1     2
#  5:  A        2     1
#  6:  A        1     3
#  7:  B        2     1
#  8:  B        2     2 # incrementing when shouldn't
#  9:  B        2     1 
# 10:  B        3     1 
# 11:  B        3     1
# 12:  B        1     1
# 13:  C        2     1
# 14:  C        3     1
# 15:  C        3     2
# 16:  C        3     1
# 17:  C        1     1
# 18:  C        3     1

At this point, I'm pretty lost, though I'm sure the solution is staring me in the face.

score 3 · Accepted Answer · answered Nov 07 '17 at 20:38

You can group by ID and then take the rleid of Location column:

example[, Group := rleid(Location), ID]
example
#    ID Location Group
# 1:  A        1     1
# 2:  A        2     2
# 3:  A        3     3
# 4:  A        1     4
# 5:  A        2     5
# 6:  A        1     6
# 7:  B        2     1
# 8:  B        2     1
# 9:  B        2     1
#10:  B        3     2
#11:  B        3     2
#12:  B        1     3
#13:  C        2     1
#14:  C        3     2
#15:  C        3     2
#16:  C        3     2
#17:  C        1     3
#18:  C        3     4

all.equal(example, expected_output)
# [1] TRUE

That seems to work perfectly, thank you! I had no knowledge of `rleid` previously. Will accept as soon as SO allows me to. — Gaffi, Nov 07 '17 at 20:42

Indexing based on consecutive field values

1 Answers1