Determing bounds of a data.table column based on previous values

Question

I have an (apparently simple) problem but I cannot seem to find a way of solving it. Here is a basic setup:

IDS<-c('ID1','ID2','ID3','ID4')
CNT<-1:10;
d<-data.table(merge(CNT,IDS),key=c('y','x'))
setnames(d,colnames(d),c('CNT','ID'))
r<-c(1,1.1,0.9,1.2)
d[,SIGNAL:=r[which(IDS==ID)]*c(62.2,62.2,61.4,61.4,63.4,66.1,62.6,62.6,59.5,57.5),by=ID]

(The example is going to be on the ID1, the r variable is just to give some variability)

The question is the following: I'd like to add two columns which will hold the 'range' within which the signal is fluctuating. The range is a parameter (in this example it is 6 [signal-3, signal+3]. Also the range should remain linear until the signal 'crosses' the bounds (up or down). Then it should reset.

This range should not change until the signal crosses the previous set bounds. Let me work on the example I gave you:

For the case of ID1, I would expect this range to be:

   CNT  ID SIGNAL LOWER.BOUND UPPER.BOUND
 1:   1 ID1   62.2        59.2        65.2
 2:   2 ID1   62.2        59.5        65.2
 3:   3 ID1   61.4        59.2        65.2
 4:   4 ID1   61.4        59.2        65.2
 5:   5 ID1   63.4        59.2        65.2
 6:   6 ID1   66.1        63.1        69.1
 7:   7 ID1   62.6        59.6        65.6
 8:   8 ID1   62.6        59.6        65.6
 9:   9 ID1   59.5        56.5        62.5
10:  10 ID1   57.5        56.5        62.5

So you see, whenever the signal crosses the previous bounds (upper or lower), the bounds are recomputed.

I've tried several methods to be honest with you, but I always find a glitch!. The fact that I have to constantly check with the previous bounds is not the easiest job...

I've tried setting the bounds based on the first ones and adjust whenever there is a crossing of the bounds, but this would not work in the case of CNT=9, ID=ID1. The signal is 59.5 and if I had propagated (na.locf-ed) the first values then the bounds in this case would have been 59.2-65.2 but I need 56.5-62.5.
I tried computing for each signal the corresponding bounds, but then the bounds are not 'linear'.
Then I tried doing this row by row, but it didn't work.
Then I broke my PC. That didn't work either :)

Apologies if the question is too... meaningless.. If you find it interesting and want to contribute, please let me know and I'll try to rephrase it/add more info.

Thank you very much for your help

PS The reason for data.table is that the number of rows are in the order of millions and data.table is by far the best performer from the libraries I've used. I would prefer to stick to a data.table.

N

Roland · Accepted Answer · 2014-02-27T21:09:02.717

Instead of hurting my head, I would write a simple function with Rcpp:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List funcpp(NumericVector x, double r) {
  const int n(x.size());
  NumericVector lwr(n), upr(n);
  lwr[0] = x[0]-r;
  upr[0] = x[0]+r;
  for (int i=1; i<n; i++)
  {
    if (x[i]<lwr[i-1] || x[i]>upr[i-1])
    {
      lwr[i] = x[i]-r;
      upr[i] = x[i]+r;
    }
    else
    {
      lwr[i] = lwr[i-1];
      upr[i] = upr[i-1];
    }
  }
   return Rcpp::List::create(_["lwr"] = lwr,
                             _["upr"] = upr);
}

Use it in the data.table:

d[, c("lwr", "upr") := funcpp(SIGNAL,3), by=ID]

    CNT  ID SIGNAL   lwr   upr
 1:   1 ID1  62.20 59.20 65.20
 2:   2 ID1  62.20 59.20 65.20
 3:   3 ID1  61.40 59.20 65.20
 4:   4 ID1  61.40 59.20 65.20
 5:   5 ID1  63.40 59.20 65.20
 6:   6 ID1  66.10 63.10 69.10
 7:   7 ID1  62.60 59.60 65.60
 8:   8 ID1  62.60 59.60 65.60
 9:   9 ID1  59.50 56.50 62.50
10:  10 ID1  57.50 56.50 62.50
11:   1 ID2  68.42 65.42 71.42
12:   2 ID2  68.42 65.42 71.42
13:   3 ID2  67.54 65.42 71.42
14:   4 ID2  67.54 65.42 71.42
15:   5 ID2  69.74 65.42 71.42
16:   6 ID2  72.71 69.71 75.71
17:   7 ID2  68.86 65.86 71.86
18:   8 ID2  68.86 65.86 71.86
19:   9 ID2  65.45 62.45 68.45
20:  10 ID2  63.25 62.45 68.45
21:   1 ID3  55.98 52.98 58.98
22:   2 ID3  55.98 52.98 58.98
23:   3 ID3  55.26 52.98 58.98
24:   4 ID3  55.26 52.98 58.98
25:   5 ID3  57.06 52.98 58.98
26:   6 ID3  59.49 56.49 62.49
27:   7 ID3  56.34 53.34 59.34
28:   8 ID3  56.34 53.34 59.34
29:   9 ID3  53.55 53.34 59.34
30:  10 ID3  51.75 48.75 54.75
31:   1 ID4  74.64 71.64 77.64
32:   2 ID4  74.64 71.64 77.64
33:   3 ID4  73.68 71.64 77.64
34:   4 ID4  73.68 71.64 77.64
35:   5 ID4  76.08 71.64 77.64
36:   6 ID4  79.32 76.32 82.32
37:   7 ID4  75.12 72.12 78.12
38:   8 ID4  75.12 72.12 78.12
39:   9 ID4  71.40 68.40 74.40
40:  10 ID4  69.00 68.40 74.40
    CNT  ID SIGNAL   lwr   upr

You must think I am a complete idiot :) Thanks for your reply. I thought it would be easy just with R. Speed is also amazing. PS For info, I named the function 'doRoland', so you are in a sense the godfather :). Again, thank you. — Nikos, Feb 28 '14 at 10:11

Determing bounds of a data.table column based on previous values

1 Answers1