least square approximation: how this matrix calculation equation is deducted?

Question

I am reading a book "kernel methods for pattern analysis". For the least square approximation, it is to minimise the sum of the square of the discrepancies:

e=y-Xw

Therefore it is to minimize

L(w,S)=(y-Xw)'(y-Xw)

Leading to $$ w=(X'X)^-1 X'y $$

I understand until now. But how does it leads to this? What is a exactly? Is it constant?

enter image description here

Why the markup for equation doesn't work? – CyberPlayerOne Apr 19 '14 at 11:13 — CyberPlayerOne, Apr 19 '14 at 11:13
[so] doesn't support LaTeX. – Bernhard Barker Apr 19 '14 at 11:27 — Bernhard Barker, Apr 19 '14 at 11:27

CliffordVienna · Answer 1 · 2014-04-26T01:52:11.320

The same way you would solve for the minima (or maxima) of a quadratic function in only one variable: By solving for the zero in the first derivative:

diff((y-Xw)' (y-Xw), w) = 0

(only that this "0" is a row vector with as many elements as w.)

after performing the differentiation we get the following. (note that ' is the transpose, not a differentiation operator.)

-2y'X + 2w'X'X = 0

we transpose the whole expression (so 0 is a column vector) and divide by two:

-X'y + X'Xw = 0

and finally solve for w:

w = (X'X)^-1 X'y

Regarding your second question: The alpha is simply the whole expression X(X'X)^-2X'y. The point is that w can be written as the dot product of X' and some vector, which means that w is a linear combination of the columns of X' (rows of X).

least square approximation: how this matrix calculation equation is deducted?

e=y-Xw

L(w,S)=(y-Xw)'(y-Xw)

1 Answers1