1

I have an array of cartesian points (column 1 is x values and column 2 is y values) like so:

308 522
307 523
307 523
307 523
307 523
307 523
306 523

How would I go about getting a standard deviation of the points? It would be compared to the mean, which would be a straight line. The points are not that straight line, so then the standard deviation describes how wavy or "off-base" from the straight line the line segment is.

I really appreciate the help.

intl
  • 2,753
  • 9
  • 45
  • 71
  • 3
    well, do you NEED "std deviation" (not that one is defined in 2D anyway)? Because what would be more natural in this case is to draw a best fit line and use the R^2 value (coefficient of determination) to describe how "off-base" the data is from the straight line. – im so confused Oct 04 '12 at 21:45
  • Could you please give me an example of how I would do this? Say the line of best fit would be between the points 300,500 and 310,550. – intl Oct 04 '12 at 22:28
  • Sorry for the extra reply, but essentially what I'm trying to determine if is the segment through those points is straight or curved and to find a way to quantify it. – intl Oct 04 '12 at 23:54
  • use this for now, I'll come back to this tomorrow http://en.wikipedia.org/wiki/Coefficient_of_determination – im so confused Oct 05 '12 at 02:22
  • but yeah, if you mean through all those points, R^2 is the standard value used to quantify non-exactness of a linear fit. If you mean if an arc through two points is curved or not, you can curve an arc through two points as much as you want so it doesn't make much sense – im so confused Oct 05 '12 at 02:23

2 Answers2

3

If you are certain the xy data describe a straight line, you'd do the following.

Finding the best fitting straight line equals solving the over-determined linear system Ax = b in a least-squares sense, where

xy = [
308 522
307 523
307 523
307 523
307 523
307 523
306 523];

x_vals = xy(:,1);
y_vals = xy(:,2);

A = [x_vals ones(size(x_vals))];
b = y_vals;

This can be done in Matlab like so:

sol = A\b;

m = sol(1);
c = sol(2);

What we've done now is find the values for m and c so that the line described by the equation y = mx+c best-fits the data you've given. This best-fit line is not perfect, so it has errors w.r.t. the y-data:

errs = (m*x_vals + c) - y_vals;

The standard deviation of these errors can be computed like so:

>> std(errs)
ans = 
    0.2440

If you want to use the perpendicular distance to the line (Euclidian distance), you'll have to include a geometric factor:

errs = (m*x_vals + c) - y;
errs_perpendicular = errs * cos(atan(m));

Using trig identities this can be reworked to

errs_perpendicular = errs * 1/sqrt(1+m*m);

and of course,

>> std(errs_perpendicular)
ans = 
    0.2182

If you are not certain that a straight line fits through the data and/or your xy data essentially describe a point cloud around some common centre, you'd do the following.

Find the center of mass (COM):

COM = mean(xy);

the distances of all points to the COM:

dists = sqrt(sum(bsxfun(@minus, COM, xy).^2,2));

and the standard deviation thereof:

>> std(dists)
ans =  
    0.5059
Rody Oldenhuis
  • 37,726
  • 7
  • 50
  • 96
  • +1 - good answer. The only thing that I would do differently is to measure the Euclidian distance between the line and the points, instead of only distance in Y dimension – Andrey Rubshtein Oct 05 '12 at 11:22
  • Thanks a lot for your answer. It gives me the standard deviation (I'm doing the straight line portion). Would it be better suited to identify a straight(ish) line using the coefficient of determination instead? – intl Oct 05 '12 at 17:14
  • @Andrey come to think of it, the least squares fit above is also done using y-distances, so ideally, if you're going to use perpendicular distances, you'll also have to do perpendicular fit. This is a tad too complicated I guess for this question... – Rody Oldenhuis Oct 05 '12 at 17:18
1

The mean of a set of two-dimensional values is another two-dimensional value, i.e. it's a point, not a line. This point is also known as the centre of mass, I believe.

It's not entirely clear what standard deviation is in this case, but I think it would make sense to define it in terms of distance from the mean.

Qnan
  • 3,714
  • 18
  • 15