I have an (apparently simple) problem but I cannot seem to find a way of solving it. Here is a basic setup:
IDS<-c('ID1','ID2','ID3','ID4')
CNT<-1:10;
d<-data.table(merge(CNT,IDS),key=c('y','x'))
setnames(d,colnames(d),c('CNT','ID'))
r<-c(1,1.1,0.9,1.2)
d[,SIGNAL:=r[which(IDS==ID)]*c(62.2,62.2,61.4,61.4,63.4,66.1,62.6,62.6,59.5,57.5),by=ID]
(The example is going to be on the ID1, the r variable is just to give some variability)
The question is the following: I'd like to add two columns which will hold the 'range' within which the signal is fluctuating. The range is a parameter (in this example it is 6 [signal-3, signal+3]. Also the range should remain linear until the signal 'crosses' the bounds (up or down). Then it should reset.
This range should not change until the signal crosses the previous set bounds. Let me work on the example I gave you:
For the case of ID1, I would expect this range to be:
CNT ID SIGNAL LOWER.BOUND UPPER.BOUND
1: 1 ID1 62.2 59.2 65.2
2: 2 ID1 62.2 59.5 65.2
3: 3 ID1 61.4 59.2 65.2
4: 4 ID1 61.4 59.2 65.2
5: 5 ID1 63.4 59.2 65.2
6: 6 ID1 66.1 63.1 69.1
7: 7 ID1 62.6 59.6 65.6
8: 8 ID1 62.6 59.6 65.6
9: 9 ID1 59.5 56.5 62.5
10: 10 ID1 57.5 56.5 62.5
So you see, whenever the signal crosses the previous bounds (upper or lower), the bounds are recomputed.
I've tried several methods to be honest with you, but I always find a glitch!. The fact that I have to constantly check with the previous bounds is not the easiest job...
- I've tried setting the bounds based on the first ones and adjust whenever there is a crossing of the bounds, but this would not work in the case of CNT=9, ID=ID1. The signal is 59.5 and if I had propagated (na.locf-ed) the first values then the bounds in this case would have been 59.2-65.2 but I need 56.5-62.5.
- I tried computing for each signal the corresponding bounds, but then the bounds are not 'linear'.
- Then I tried doing this row by row, but it didn't work.
- Then I broke my PC. That didn't work either :)
Apologies if the question is too... meaningless.. If you find it interesting and want to contribute, please let me know and I'll try to rephrase it/add more info.
Thank you very much for your help
PS The reason for data.table is that the number of rows are in the order of millions and data.table is by far the best performer from the libraries I've used. I would prefer to stick to a data.table.
N