0

I am reading a book "kernel methods for pattern analysis". For the least square approximation, it is to minimise the sum of the square of the discrepancies:

e=y-Xw

Therefore it is to minimize

L(w,S)=(y-Xw)'(y-Xw)

Leading to $$ w=(X'X)^-1 X'y $$

I understand until now. But how does it leads to this? What is a exactly? Is it constant?

enter image description here

Community
  • 1
  • 1
CyberPlayerOne
  • 3,078
  • 5
  • 30
  • 51

1 Answers1

0

The same way you would solve for the minima (or maxima) of a quadratic function in only one variable: By solving for the zero in the first derivative:

diff((y-Xw)' (y-Xw), w) = 0

(only that this "0" is a row vector with as many elements as w.)

after performing the differentiation we get the following. (note that ' is the transpose, not a differentiation operator.)

-2y'X + 2w'X'X = 0

we transpose the whole expression (so 0 is a column vector) and divide by two:

-X'y + X'Xw = 0

and finally solve for w:

w = (X'X)^-1 X'y

Regarding your second question: The alpha is simply the whole expression X(X'X)^-2X'y. The point is that w can be written as the dot product of X' and some vector, which means that w is a linear combination of the columns of X' (rows of X).

CliffordVienna
  • 7,995
  • 1
  • 37
  • 57