1

I have a data.table in R tracking the movement of items within a system. I'd like to group these data based on two fields, ID and Location.

library(data.table)
example <- data.table(ID = rep(LETTERS[1:3], each = 6),
                      Location = c(1,2,3,1,2,1,2,2,2,3,3,1,2,3,3,3,1,3))
example
#     ID Location
#  1:  A        1
#  2:  A        2
#  3:  A        3
#  4:  A        1
#  5:  A        2
#  6:  A        1
#  7:  B        2
#  8:  B        2
#  9:  B        2
# 10:  B        3
# 11:  B        3
# 12:  B        1
# 13:  C        2
# 14:  C        3
# 15:  C        3
# 16:  C        3
# 17:  C        1
# 18:  C        3

What I would like to have as an output is a new column with a number that increments each time the location changes, regardless of what the new location is (i.e. if that location exists elsewhere in the history). Rather the opposite of this question, which increments only within the group.

expected_output <- data.table(ID = rep(LETTERS[1:3], each = 6),
                     Location = c(1,2,3,1,2,1,2,2,2,3,3,1,2,3,3,3,1,3),
                     Group = c(1,2,3,4,5,6,1,1,1,2,2,3,1,2,2,2,3,4))
expected_output
#     ID Location Group
#  1:  A        1     1
#  2:  A        2     2
#  3:  A        3     3
#  4:  A        1     4
#  5:  A        2     5
#  6:  A        1     6
#  7:  B        2     1
#  8:  B        2     1
#  9:  B        2     1
# 10:  B        3     2
# 11:  B        3     2
# 12:  B        1     3
# 13:  C        2     1
# 14:  C        3     2
# 15:  C        3     2
# 16:  C        3     2
# 17:  C        1     3
# 18:  C        3     4

I've tried a few different combinations of by arguments with little luck so far. The closest I seem to be able to get is something with diff which partially correctly shows when the change occurs, but increments internally.

output <- example
output[, Group := 1:.N, by = paste0(ID, Location, diff(Location))]
output
#     ID Location Group
#  1:  A        1     1 
#  2:  A        2     1 # not incrementing/new group
#  3:  A        3     1 # not incrementing/new group
#  4:  A        1     2
#  5:  A        2     1
#  6:  A        1     3
#  7:  B        2     1
#  8:  B        2     2 # incrementing when shouldn't
#  9:  B        2     1 
# 10:  B        3     1 
# 11:  B        3     1
# 12:  B        1     1
# 13:  C        2     1
# 14:  C        3     1
# 15:  C        3     2
# 16:  C        3     1
# 17:  C        1     1
# 18:  C        3     1

At this point, I'm pretty lost, though I'm sure the solution is staring me in the face.

Gaffi
  • 4,307
  • 8
  • 43
  • 73

1 Answers1

3

You can group by ID and then take the rleid of Location column:

example[, Group := rleid(Location), ID]
example
#    ID Location Group
# 1:  A        1     1
# 2:  A        2     2
# 3:  A        3     3
# 4:  A        1     4
# 5:  A        2     5
# 6:  A        1     6
# 7:  B        2     1
# 8:  B        2     1
# 9:  B        2     1
#10:  B        3     2
#11:  B        3     2
#12:  B        1     3
#13:  C        2     1
#14:  C        3     2
#15:  C        3     2
#16:  C        3     2
#17:  C        1     3
#18:  C        3     4

all.equal(example, expected_output)
# [1] TRUE
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • 2
    That seems to work perfectly, thank you! I had no knowledge of `rleid` previously. Will accept as soon as SO allows me to. – Gaffi Nov 07 '17 at 20:42