I have a data.table
in R tracking the movement of items within a system. I'd like to group these data based on two fields, ID
and Location
.
library(data.table)
example <- data.table(ID = rep(LETTERS[1:3], each = 6),
Location = c(1,2,3,1,2,1,2,2,2,3,3,1,2,3,3,3,1,3))
example
# ID Location
# 1: A 1
# 2: A 2
# 3: A 3
# 4: A 1
# 5: A 2
# 6: A 1
# 7: B 2
# 8: B 2
# 9: B 2
# 10: B 3
# 11: B 3
# 12: B 1
# 13: C 2
# 14: C 3
# 15: C 3
# 16: C 3
# 17: C 1
# 18: C 3
What I would like to have as an output is a new column with a number that increments each time the location changes, regardless of what the new location is (i.e. if that location exists elsewhere in the history). Rather the opposite of this question, which increments only within the group.
expected_output <- data.table(ID = rep(LETTERS[1:3], each = 6),
Location = c(1,2,3,1,2,1,2,2,2,3,3,1,2,3,3,3,1,3),
Group = c(1,2,3,4,5,6,1,1,1,2,2,3,1,2,2,2,3,4))
expected_output
# ID Location Group
# 1: A 1 1
# 2: A 2 2
# 3: A 3 3
# 4: A 1 4
# 5: A 2 5
# 6: A 1 6
# 7: B 2 1
# 8: B 2 1
# 9: B 2 1
# 10: B 3 2
# 11: B 3 2
# 12: B 1 3
# 13: C 2 1
# 14: C 3 2
# 15: C 3 2
# 16: C 3 2
# 17: C 1 3
# 18: C 3 4
I've tried a few different combinations of by
arguments with little luck so far. The closest I seem to be able to get is something with diff
which partially correctly shows when the change occurs, but increments internally.
output <- example
output[, Group := 1:.N, by = paste0(ID, Location, diff(Location))]
output
# ID Location Group
# 1: A 1 1
# 2: A 2 1 # not incrementing/new group
# 3: A 3 1 # not incrementing/new group
# 4: A 1 2
# 5: A 2 1
# 6: A 1 3
# 7: B 2 1
# 8: B 2 2 # incrementing when shouldn't
# 9: B 2 1
# 10: B 3 1
# 11: B 3 1
# 12: B 1 1
# 13: C 2 1
# 14: C 3 1
# 15: C 3 2
# 16: C 3 1
# 17: C 1 1
# 18: C 3 1
At this point, I'm pretty lost, though I'm sure the solution is staring me in the face.