4

I´m trying to implement a method that fits a line to a set of points in 2D. I wrote the following code that reads the data from two Array (X, Y coordinate) and should calculate the parameters of the best fitting line with the least squares method. I used the formulas given here: mathworld.wolfram

- (void) linearRegressionOfUserAcceleration 
{
     double avgX = [[_accelBufferX valueForKeyPath:@"@avg.doubleValue"] doubleValue];
     double avgY = [[_accelBufferY valueForKeyPath:@"@avg.doubleValue"] doubleValue];
     int n = _accelBufferX.count;

     double ssX, ssY, ssXY;
     ssX = ssY = ssXY = 0;
     int i;

     // Sum of squares X, Y & X*Y
     for (i = 0; i < n; i++)
     {
         ssX += pow([[_accelBufferX objectAtIndex:i] doubleValue],2);
         ssY += pow([[_accelBufferY objectAtIndex:i] doubleValue],2);
         ssXY += [[_accelBufferX objectAtIndex:i] doubleValue] * [[_accelBufferY objectAtIndex:i] doubleValue];
     }

     ssX = ssX - n * pow(avgX,2);
     ssY = ssY - n * pow(avgY,2);
     ssXY = ssXY - n * avgX * avgY;

     // Best fit of line y_i = a + b * x_i 
     b = ssXY / ssX;
     a = avgY - b * avgX;

     // Correlationcoefficent gives the quality of the estimate: 1 = perfect to 0 = no fit
     corCoeff = pow(ssXY,2) / ssX * ssY;

     NSLog(@"n: %d, a: %f --- b: %f --- cor: %f --- avgX: %f --- avgY: %f --- ssX: %f - ssY: %f - ssXY: %f", n, a, b, corCoeff, avgX, avgY, ssX, ssY, ssXY);
}

I get outputs like this:

  n: 15, a: -0.095204 --- b: 0.929245 --- cor: 3.567163   --- avgX: -0.017827 -- avgY: -0.111770 --- ssX: 2.176048 - ssY: 1.898429 - ssXY: 2.022081

The resulting line does not fit the data at all and although the corelationCoefficient is sometimes bigger than one, which IMHO should never happen if everything works correctly.

Does anyone see any errors in my implementation?

- EDIT -

This is the corrected code, following the tip from CRD. I used this to extract the direction vector of the sampled userAcceleration in the horizontal plane between two steps, to get the step direction.

This worked for me:

- (void) linearRegressionOfUserAcceleration
{
    NSUInteger n = _accelBufferX.count;
    double ax, ay, sX, sY, ssX, ssY, ssXY, avgX, avgY;

    // Sum of squares X, Y & X*Y
    for (NSUInteger i = 0; i < n; i++)
    {
        @synchronized(self) {
            ax = [[_accelBufferX objectAtIndex:i] doubleValue];
            ay = [[_accelBufferY objectAtIndex:i] doubleValue];
        }
        sX += ax;
        sY += ay;
        ssX += ax * ax;
        ssY += ay * ay;
        ssXY += ax * ay;
    }

    avgX = sX / n;
    avgY = sY / n;
    radius = hypot(avgX, avgY);
    ssX = ssX - n * (avgX * avgX);
    ssY = ssY - n * (avgY * avgY);
    ssXY = ssXY - n * avgX * avgY;

    // Best fit of line y_i = a + b * x_i
    b = ssXY / ssX;
    a = (avgY - b * avgX);
    theta = atan2(1, b);


    // Correlationcoefficent gives the quality of the estimate: 1 = perfect to 0 = no fit
    corCoeff = (ssXY * ssXY) / (ssX * ssY);

    NSLog(@"n: %d, a: %f --- b: %f --- cor: %f   --- avgX: %f -- avgY: %f --- ssX: %f - ssY: %f - ssXY: %f", n, a, b, corCoeff, avgX, avgY, ssX, ssY, ssXY);
}
d00d
  • 514
  • 10
  • 20
  • I'm familiar with the `double ssX, ssY, ssXY;` syntax, but `ssX = ssY = ssXY = 0;` is new for me :) – Anne Jun 03 '12 at 09:24
  • If you changed your implementation, can you please update this code for correctness? I just tried it and am getting correlationCoefficients of over 1... – horseshoe7 Dec 20 '13 at 14:56
  • (UPDATE: Just answered my own question... it was the pow(..) function. – horseshoe7 Dec 20 '13 at 15:04

1 Answers1

3

Put in some known data you can check by hand, e.g. {1,1}, {2,2}, {3,3}. Are the averages correct? If so move on to the sums, etc. The error will reveal itself.

On the code itself you can make it clearer, and incidentally more efficient, by dropping the calls to @"avg.doubleValue" and producing all your sums in a single loop:

// Sum of X, Y, X^2, Y^2 & X*Y
for (NSUInteger i = 0; i < n; i++)
{
   double x = [[_accelBufferX objectAtIndex:i] doubleValue];
   double y = [[_accelBufferY objectAtIndex:i] doubleValue];
   sX += x;
   sY += y;
   ssX += x * x;
   ssY += y * y;
   ssXY += x * y;
}
CRD
  • 52,522
  • 5
  • 70
  • 86
  • thx a lot for your help. I changed my implementation like you suggested and now it works like a charm! I can´t tell you what was wrong with the first code, but who cares :) now... sorry for not voting, but I need some more reputation first – d00d Jun 03 '12 at 15:53
  • I accepted the answer. Sorry I´m pretty new here. I although realized that my problem with the correletion coefficint getting bigger than one, was due to canceletion errors in my implementation. I added a factor to the values befor computing the regression and now the values are correct. – d00d Jun 04 '12 at 09:01
  • @D.Hot - No need to apologize, we were all new once and had to learn how the site works. – CRD Jun 04 '12 at 17:31
  • If you changed your implementation, can you please update this code for correctness? I just tried it and am getting correlationCoefficients of over 1... – horseshoe7 Dec 20 '13 at 14:55