4

Given a line function y = a*x + b (a and b are previously known constants), it is easy to calculate the sum-of-squares distance between the line and a window of samples (1, Y1), (2, Y2), ..., (n, Yn) (where Y1 is the oldest sample and Yn is the newest):

sum((Yx - (a*x + b))^2 for x in 1,...,n)

I need a fast algorithm for calculating this value for a rolling window (of length n) - I cannot rescan all the samples in the window every time a new sample arrives.
Obviously, some state should be saved and updated for every new sample that enters the window and every old sample leaves the window.
Notice that when a sample leaves the window, the indecies of the rest of the samples change as well - every Yx becomes Y(x-1). Therefore when a sample leaves the window, every other sample in the window contribute a different value to the new sum: (Yx - (a*(x-1) + b))^2 instead of (Yx - (a*x + b))^2.

Is there a known algorithm for calculating this? If not, can you think of one? (It is ok to have some mistakes due to first-order linear approximations).

Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
Oren
  • 2,767
  • 3
  • 25
  • 37

2 Answers2

2

Won't a straightforward approach do the trick?...

By 'straightforward' I mean maintaining a queue of samples. Once a new sample arrives, you would:

  • pop the oldest sample from the queue
  • subtract its distance from your sum
  • append the new sample to the queue
  • calculate its distance and add it to your sum

As for time, everything here is O(1) if the queue is implemented as linked list or something similar, You would want to store the distance with your samples in queue, too, so you calculate it only once. The memory usage is thus 3 floats per sample - O(n).

Xion
  • 22,400
  • 10
  • 55
  • 79
  • Maybe I didn't explain myself enough, but when the window moves the indexes of the points move as well. When Y1 leaves the window, Yx becomes Y(x-1), so now it contributes a different value to the squares sum. – Oren Jan 06 '12 at 00:20
  • Oh! I just noticed your sum formula uses `x` for sample **index**, and the x coordinate thus isn't arbitrary. Damn, I knew it cannot be that simple :) – Xion Jan 06 '12 at 00:27
  • Actually the x-axis is originally not based on round integers, but it's only a matter of scaling. I tried to simplify the problem this way. – Oren Jan 06 '12 at 00:37
1

If you expand the term (Yx - (a*x + b))^2 the terms break into three parts:

  1. Terms of only a,x and b. These produce some constant when summed over n and can be ignored.
  2. Terms of only Yx and b. These can be handled in the style of a boxcar integrator as @Xion described.
  3. One term of -2*Yx*a*x. The -2*a is a constant so ignore that part. Consider the partial sum S = Y1*1 + Y2*2 + Y3*3 ... Yn*n. Given Y1 and a running sum R = Y1 + Y2 + ... + Yn you can find S - R which eliminates Y1*1 and reduces each of the other terms, leaving you with Y2*1 + Y3*2 + ... + Yn*(n-1). Now update the running sum R as for (2) by subtracting off Y1 and adding Y(n+1). Add the new Yn*n term to S.

Now just add up all those partial terms.

Ben Jackson
  • 90,079
  • 9
  • 98
  • 150
  • Great! Now I only have to be careful with the implementation details. Do you know if this problem have a mathematical name? So I could google it? – Oren Jan 06 '12 at 00:52
  • +1. I was about to ask the same, as it looks like this solution could be generalized to any polynomial - or more generally, a function whose k-th derivative is constant for some _k_. (Although it would require progressively more running sums whose updates cascade back to `S` and thus might be hairy on the exact math side) – Xion Jan 06 '12 at 00:56
  • @Oren: No, I didn't even think it was going to be possible. I started writing that explaining how some terms were easy and some were problematical. I was explaining why (3) was a problem when I realized the solution. – Ben Jackson Jan 06 '12 at 03:58