Multivar linear regression should be mathematically undetermined (Octave)

Question

I apologize in advance for the rather abstract nature of my question, but it is indirectly a question about programming algorithms, and I don't think I'll be the only programmer to wonder about this.

This is about the implementation of the multi-variable ordinary least squares (OLS) regression algorithm in Octave (and, I assume, in MatLab as well). As far as I can tell, if one inputs two variables into a linear regression with just one single measurement, the result (i.e. the coefficients) should be mathematically undetermined: unless you accept black magic as a valid premise, how could one possibly tell in which way each of the variables affects the final result? In the more general case, the number of measurements must (I think) be at least equal to the number of variables for the resulting coefficients to make any sense (let alone statistical errors and all that).

Octave, however, is all too happy to compute a result, with no warnings whatsoever:

octave:1> ols([1], [1, 1])
ans =

   0.50000
   0.50000

In other words -- if I got this right -- given the equation 1 = x + y, Octave joyfully concludes that x = y = 0.5.

As such, assuming (as I am) that Octave has no direct connection to Satan, here are my questions:

Am I misunderstanding the mathematical foundation? In other words, is this possibly a legitimate result?
If I'm right, why isn't Octave spitting an error -- or, at the very least, quite a stern warning regarding the totally moronic data I'm asking it to analyze?

Nothing here really about programming, but a question about your understanding of the math. This is a statistics question, or a math question, depending on your point of view. — , Aug 09 '13 at 22:40
If I failed to understand the mathematical foundation, yes, you are correct. But if my understanding is correct (which I strongly think it is), then it's about Octave's behavior, which in turn is used for programming. — Bogdan Stăncescu, Aug 09 '13 at 22:42
No. You misunderstand the mathematics. There are infinitely many solutions to an underdetermined problem. These tools choose one of them, somewhat arbitrarily, although it is based on some reasonable logic for the choice made. And in fact, there is no need to return a warning, as this is something that happens often enough and is used for good reason with no warning needed. — , Aug 09 '13 at 22:53
This question appears to be off-topic because it is about math/statistics. — , Aug 09 '13 at 22:58
This question is about the specific behavior of one specific piece of software -- it is not about a mathematical problem in general. I have already received my answer, so if people are adamant about removing this question I don't really care on a personal level; having said that, I still believe it's relevant as is, for other programmers who might encounter (and wonder about) the (somewhat) unexpected behavior of this piece of software. — Bogdan Stăncescu, Aug 09 '13 at 23:11

score 1 · Accepted Answer · answered Aug 09 '13 at 22:38

1

take a look at this Octave documentation:

http://www.gnu.org/software/octave/doc/interpreter/Linear-Least-Squares.html

In the description of output beta, it says that the value will be the pseudo-inverse of x times y when the matrix is not of full-rank (as is your case for matrix [1, 1]. [0.5; 0.5] is the pseudo-inverse of [1, 1].

Hope that helps!

answered Aug 09 '13 at 22:38

MattG

1,416
2
13
31

Fair enough, I hadn't noticed that. – Bogdan Stăncescu Aug 09 '13 at 22:46

score 1 · Answer 2 · answered Aug 09 '13 at 22:40

1

Your system simply isn't full rank. According to the documentation ols solves such a system as

b = pinv(x)*y

or, in your case, simply

b = pinv([1 1])

ans = 

    0.5000
    0.5000

where pinv is the Moore–Penrose pseudoinverse.

answered Aug 09 '13 at 22:40

horchler

18,384
4
37
73

Thank you for taking the time to answer; I had to accept MattG's answer because he was the first to offer basically the same explanation for the behavior I described. – Bogdan Stăncescu Aug 09 '13 at 22:45
I +1-ed this for the Moore-Penrose link: I didn't know that was this particular definition's name. Thanks horchler! – MattG Aug 09 '13 at 22:49

score 0 · Answer 3 · answered Aug 09 '13 at 23:26

0

Ordinary least square regression is expressed as:

Ax = y

which is usually directly solved using the pseudo-inverse:

x = inv(A'*A)*A'*y

or

x = pinv(A) * y

In the case of full rank matrix, we can perform Cholesky decomposition: R = chol(A'*A) so that (A'*A) = R'R. This can be used as:

Ax = y
A'Ax = A'y
R'Rx = A'y
Rx = R'\(A'y)
x = R\(R'\(A'y))

Note that in the last step, the backslash operator (mldivide) performs a simple forward-backward substitution using the triangular matrix R

In fact that's how Octave implements it: http://hg.octave.org/octave/file/tip/scripts/statistics/base/ols.m#l110

There are other ways to solve the system such as iterative methods.

answered Aug 09 '13 at 23:26

Amro

123,847
25
243
454

I'm starting to think that the way I asked the question was poorly explained, or maybe my way of thinking is plain wrong... The essence of my question was not regarding Octave's algorithm, but rather regarding Octave's (and possibly MatLab's) "philosophical" approach. That is, even if some algorithm or another can offer a correct answer to a specific question (which it obviously can), isn't it philosophically/mathematically/professionally wrong to offer that answer without a warning when you (or the algorithm you designed) knows the problem itself is undetermined? – Bogdan Stăncescu Aug 09 '13 at 23:39
1

@Gutza: like woodchips mentioned in the comments above, I see no reason why a warning should be issued. The are infinitely many solutions to the under-determined system you gave (any two numbers that sum to one), `ols` function gave you one of them which is perfectly reasonable... – Amro Aug 09 '13 at 23:45
why does 1/0 issue a warning? Infinity is a perfectly reasonable answer. – Bogdan Stăncescu Aug 09 '13 at 23:46
No warning is issued for `1/0` in Matlab R2013a. – horchler Aug 09 '13 at 23:48
MATLAB does not issue a warning when dividing by zero. If you dont like it in Octave, it can be silenced: http://www.gnu.org/software/octave/doc/interpreter/Enabling-and-Disabling-Warnings.html . I think the reasoning behind it is that [dividing by zero](http://en.wikipedia.org/wiki/Division_by_zero) is often an indication of a logical error in your code.. I think this has a historical aspect to it as well, where on calculators we are used to getting the dreaded `divide by zero` error – Amro Aug 09 '13 at 23:50
I'm sure (or at least hopeful) that you can enable or disable warnings regarding division by zero, both in Octave and in Matlab. That's a reasonable choice for undetermined results. The point of my original question was whether this was or was not a reasonable option for the user regarding the situation I described -- which certainly equates to undetermination. Since Octave (and possibly Matlab) doesn't offer the user a chance to be warned about the undetermination of its genuinely undeterminated response, I think this is unexpected, undesired, and worthy of being analyzed on stackoverflow. – Bogdan Stăncescu Aug 09 '13 at 23:57

Multivar linear regression should be mathematically undetermined (Octave)

3 Answers3