Linear curve fitting with errors

Question

I was looking for a way to perform a linear curve fit in Javascript. I found several libraries, but they don't propagate errors. What I mean is, I have data and associated measurement errors, like:

x = [ 1.0 +/- 0.1, 2.0 +/- 0.1, 3.1 +/- 0.2, 4.0 +/- 0.2 ]
y = [ 2.1 +/- 0.2, 4.0 +/- 0.1, 5.8 +/- 0.4, 8.0 +/- 0.1 ]

Where my notation a +/- b means { value : a, error : b }.

I want to fit this into y = mx + b, and find m and b with their propagated errors. I know the Least Square Method algorithm, that I could implement, but it only take errors on the y variable, and I have distinct errors in both.

I also could not find a library in Javascript to do that; but if there is an open source lib in other language, I can inspect it to find out how and implement it in JS.

Programs like Origin or plotly are able to implement this, but I don't know how. The result for this example dataset is:

m = 1.93 +/- 0.11
b = 0.11 +/- 0.30

This seems to be a case for [Total Least Squares](https://en.wikipedia.org/wiki/Total_least_squares) — Tomas Langkaas, Apr 25 '17 at 18:20
You can take a look at [Numerical Recipes](https://www.fing.edu.uy/if/cursos/fiscomp/extras/numrec/book/bookfpdf.html), section [15-3](https://www.fing.edu.uy/if/cursos/fiscomp/extras/numrec/book/f15.pdf). That version is in FORTRAN; the C version is also [available](http://www.nrbook.com/a/bookcpdf.php) but you need a plugin to see it. As far as I can remember, that book provides simple and clear explanations and sample code that works (not always written in the best style but since you would translate to Javascript, you can improve it). — ConnorsFan, Apr 26 '17 at 22:10
I second the recommendation by @ConnorsFan, however note that the recipe treats the x- and y-errors as normally distributed variables, which may or may not make sense in your case. — etov, Apr 27 '17 at 06:55
@ConnorsFan, that's precisely what I wanted! Thanks a lot :)) If you wouldn't mind putting up an answer, I'd be glad to accept it ;) — Luan Nico, Apr 28 '17 at 01:57

ConnorsFan · Accepted Answer · 2017-04-28T18:27:53.907

The very useful book Numerical Recipes provides a method to fit data to a straight line, with uncertainties in both X and Y coordinates. It can be found online in these two versions:

Numerical Recipes in C, in section 15-3
Numerical Recipes in Fortran 77, in section 15-3

The method is based on minimizing the χ² (chi-square) which is similar to the least-square but takes into account the individual uncertainty of each data point. When the uncertainty σ_i is on the Y axis only, a weight proportional to 1/σ_i² is assigned to the point in the calculations. When the data has uncertainties in the X and Y coordinates, given by σ_xi and σ_yi respectively, the fit to a straight line

y(x) = a + b · x

uses a χ² where each point has a weight proportional to

1 / (σ²_yi + b² · σ²_xi)

The detailed method and the code (in C or Fortran) can be found in the book. Due to copyright, I cannot reproduce them here.

score 2 · Answer 2 · answered Apr 24 '17 at 09:23

It seems that the least squares (LS) method is indeed a good direction. Given a list of x & y, the least squares return values for m & b that minimize $$\sum_{i} (m*x_{i}+b -y_{i})^{2} $$.

The benefits of the LS method is that you will find the optimal values for the parameter, the computation is fast and you will probably be able to find implantation in java script, like this one.

Now you should take care of the margins of errors that you have. Note that the way that you treat the margin of errors is more of a "business question" than a mathematical question. Meaning that few might choose few treatments based on their needs and they'll all be indifferent from mathematical point of view.

Without more knowledge about your need, I suggest that you will turn each point (x,y) into 4 points based on the margins. (x+e,y+e), (x-e, y+e), (x+e, y-e), (x-e,y-e).

The benefits of this representation is that it is simple, it gives way to the end of the margin boundaries that are typically more sensitive and the best of all - it is a reduction. Hence, once you generate the new values you can use the regular LS implementation without having to implement such algorithm on your own.

This indeed is a simple and straightforward suggestion, however note it assumes the errors are uniformly distributed. e.g. say that the x of a point is 1 +/- 0.5; then we assume that 1.0 is as likely as 1.5 or 0.5. In reality that's probably not the case - the center is usually more probable. As suggested, you should decide if that's an issue or not. — etov, Apr 27 '17 at 08:46
There could also be a problem caused by the fact that all data points are given equal weight in the calculations. In methods that I have used, the weight of each point is proportional to 1/σ², where σ is the uncertainty of the point. Not taking that into account will change the results. Points with huge uncertainties should have almost no impact on the results (as if they were not there), which is not the case in the method presented here. — ConnorsFan, Apr 28 '17 at 01:28
Nice! I like it. Both etov and ConnorsFan's ideas are great. In general, I believe that choice of the constraints to apply should be either derived from the business needs or maximize an objective on empirical data. — DaL, Apr 30 '17 at 10:38

Linear curve fitting with errors

2 Answers2

Linked