require(data.table)
set.seed(333)
t <- data.table(old=1002:2001, dif=sample(1:10,1000, replace=TRUE))
t$new <- t$old + t$dif; t$foo <- rnorm(1000); t$dif <- NULL
i <- data.table(id=1:3, start=sample(1000:1990,3))
> i
id start
1: 1 1002
2: 2 1744
3: 3 1656
> head(t)
old new foo
1: 1002 1007 -0.7889534
2: 1003 1004 0.3901869
3: 1004 1014 0.7907947
4: 1005 1011 2.0964612
5: 1006 1007 1.1834171
6: 1007 1015 1.1397910
I would like to delete time points from points
such that only those rows remain where new[i] = old[i-1]
, giving a continuous sequence of some fixed number of time points. Ideally, this would be done for all id
in i
simultaneously, where start
gives the starting points. For example, if we choose n=5
, we should obtain
> head(ans)
id old new foo
1: 1 1002 1007 -0.7889534
2: 1 1007 1015 1.1397910
3: 1 1015 1022 -1.2193670
4: 1 1022 1024 1.2039050
5: 1 1024 1026 0.4388586
6: 2 1744 1750 -0.1368320
where lines 3 to 6 cannot be inferred above and foo
is a stand in for other variables that need to be kept.
Can this be done efficiently in data.table, for example, using a clever combination of joins?
PS. This question is somewhat similar to an an earlier one of mine but I have modified the situation to make it clearer.