0

I have the following vectors:

x = [0.0069    0.0052    0.0034    0.0024    0.0001   -0.0013   -0.0003 ...
   -0.0026   -0.0040   -0.0031   -0.0034   -0.0017   -0.0013   -0.0017 ...
   -0.0010   -0.0019   -0.0015   -0.0018   -0.0031   -0.0020   -0.0008 ...
    0.0007    0.0031    0.0036    0.0060]

y = [0.0069    0.0061    0.0044    0.0031    0.0012   -0.0016   -0.0027 ...
   -0.0032   -0.0033   -0.0042   -0.0031   -0.0019   -0.0021   -0.0013 ...
   -0.0007   -0.0021   -0.0020   -0.0011   -0.0028   -0.0033   -0.0011 ...
    0.0018    0.0027    0.0038    0.0051]

And I am using a robust fit in order to get a linear function y=f(x)=m*x+p that best fits y vs x ignoring possible outliers:

[b,stats] = robustfit(x,y)

I get a slope m = b(2) = 1.0402 +/- 0.0559

and a y-intercept p = b(1) = 5.1496e-06 +/- 1.6907e-04

The uncertainties are the values I get from stats.se, which are, according to the manual the "standard" errors of coefficient estimates. But as you can see the uncertainty in the y-intercept is too large, which doesn't seem to make any sense (what's the point in using robust fitting if the uncertainties that we get are not reliable?). Any help on improving this would be very appreciated!

Thank you very much in advance!

Will
  • 1,835
  • 10
  • 21

1 Answers1

1

The standard error for the y-intercept is large relative to the y-intercept itself but still very small relative to the y-data in this fit. What you can infer statistically from this is that there is a very low probability that the value given by robustfit is better than zero. That isn't a weakness of robust regression - it's a fact of your data that it seems to pass approximately through the origin. You can see how small the standard errors are by plotting them:

scatter(x,y)
hold on
axis equal
grid on
plot(x, m             *x + p            )
plot(x, m             *x + p+stats.se(1),'m--')
plot(x,(m+stats.se(2))*x + p            ,'c--')
plot(x, m             *x + p-stats.se(1),'m--')
plot(x,(m-stats.se(2))*x + p            ,'c--')
legend('Raw data','y=m*x+p','y=m*x+p±stats.se(1)','y=(m±stats.se(2))*x+p','Location','best')

Robust fit ± standard errors

Note that these standard errors are not confidence intervals - this plot just illustrates their size.

For the data you've supplied, I'd argue there's no room for meaningfully improving this fit without improving the data. In fact, without specific knowledge of the source of the underlying data I would assume that a fit using OLS regression is just as likely to be the best estimate of the linear relationship as robust regression is.

Will
  • 1,835
  • 10
  • 21
  • Thank you very much for your answer, Will. That makes sense. Actually I tried both OLS regression and robust regression and the results are almost the same, probably because of the fact that there are not many outliers in the data. – Good Friend of Mine Dec 28 '15 at 01:11