The same way you would solve for the minima (or maxima) of a quadratic function in only one variable: By solving for the zero in the first derivative:
diff((y-Xw)' (y-Xw), w) = 0
(only that this "0" is a row vector with as many elements as w.)
after performing the differentiation we get the following. (note that ' is the transpose, not a differentiation operator.)
-2y'X + 2w'X'X = 0
we transpose the whole expression (so 0 is a column vector) and divide by two:
-X'y + X'Xw = 0
and finally solve for w:
w = (X'X)^-1 X'y
Regarding your second question: The alpha is simply the whole expression X(X'X)^-2X'y
. The point is that w
can be written as the dot product of X'
and some vector, which means that w
is a linear combination of the columns of X' (rows of X).