0

I want give IDX with id and day.

I tried .GRP ,.I ,seq_len(.N) but couldn't get right answer.

dt[,IDX:=.GRP,by=c("id","day")] 

give same IDX if id and day is same like this :

 no  id FIRST_DAY LAST_DAY       day IDX
 1:  46  20060912 20060922   10 days 1
 2: 116  20060921 20060923    0 days 2
 3:  46  20060922 20061001    0 days 3
 4:  46  20061001 20061014    0 days 3 
 5:  46  20070313 20070401 -150 days 4
 6:  46  20070401 20070501    0 days 3
 7:  46  20070501 20070601    0 days 3
 8:  46  20070601 20070613    0 days 3

but I want give different seq IDX even id and day is same.

 no  id FIRST_DAY LAST_DAY       day IDX
 1:  46  20060912 20060922   10 days 1
 2: 116  20060921 20060923    0 days 2
 3:  46  20060922 20061001    0 days 3
 4:  46  20061001 20061014    0 days 3 
 5:  46  20070313 20070401 -150 days 4
 6:  46  20070401 20070501    0 days 5
 7:  46  20070501 20070601    0 days 5
 8:  46  20070601 20070613    0 days 5
PKumar
  • 10,971
  • 6
  • 37
  • 52
zell kim
  • 69
  • 5

1 Answers1

1

You have to use run length encoding in this scenario. For this data.table has a useful function rleid() in its package. take help of this and your problem will be solved.

dt[, IDX := rleid(id, day), ]
> dt
    id FIRST_DAY LAST_DAY      day IDX
1:  46  20060912 20060922   10days   1
2: 116  20060921 20060923    0days   2
3:  46  20060922 20061001    0days   3
4:  46  20061001 20061014    0days   3
5:  46  20070313 20070401 -150days   4
6:  46  20070401 20070501    0days   5
7:  46  20070501 20070601    0days   5
8:  46  20070601 20070613    0days   5

or with baseR

dt$IDX <- with(rle(paste(dt$id, dt$day)), rep(1:length(lengths), lengths))
dt
> dt
    id FIRST_DAY LAST_DAY       day IDX
1:  46  20060912 20060922   10 days   1
2: 116  20060921 20060923    0 days   2
3:  46  20060922 20061001    0 days   3
4:  46  20061001 20061014    0 days   3
5:  46  20070313 20070401 -150 days   4
6:  46  20070401 20070501    0 days   5
7:  46  20070501 20070601    0 days   5
8:  46  20070601 20070613    0 days   5

dput used

dt <- structure(list(id = c(46L, 116L, 46L, 46L, 46L, 46L, 46L, 46L
), FIRST_DAY = c(20060912L, 20060921L, 20060922L, 20061001L, 
20070313L, 20070401L, 20070501L, 20070601L), LAST_DAY = c(20060922L, 
20060923L, 20061001L, 20061014L, 20070401L, 20070501L, 20070601L, 
20070613L), day = c("10days", "0days", "0days", "0days", "-150days", 
"0days", "0days", "0days")), row.names = c(NA, -8L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x00000222e2b81ef0>

    id FIRST_DAY LAST_DAY      day
1:  46  20060912 20060922   10days
2: 116  20060921 20060923    0days
3:  46  20060922 20061001    0days
4:  46  20061001 20061014    0days
5:  46  20070313 20070401 -150days
6:  46  20070401 20070501    0days
7:  46  20070501 20070601    0days
8:  46  20070601 20070613    0days
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45