Computing the trace of a hat matrix from and independent variable matrix with a large number of rows; how can I avoid memory errors?

Question

I have a (edited, silly typo) independent variable matrix, X. I would like to either take the trace of the hat matrix computed from X, or find some computational shortcut for getting that trace without actually computing the hat matrix. The issue is that X has 14826 rows.

res = glm_binom.fit()
YHatTemp = res.mu
HatMatTemp = X*res.pinv_wexog

(alternatively, replace the third line with)

HatMatTemp = X*np.linalg.inv(np.transpose(X)*X)*np.transpose(X)

The above gives me the following:

File "C:\Python27\lib\site-packages\numpy\matrixlib\defmatrix.py", line 341, in __mul__
return N.dot(self, asmatrix(other))
MemoryError

The reason I want the trace of the hat matrix in the first place is to compute the GCV criterion for the purpose of model selection, so upon getting this to work, I'll be repeating this process a fair amount.

I'd greatly appreciate some means of solving or bypassing this problem altogether. Thank you!

It looks like `X` is the *independent* variable, not the *dependent* one. The trace of the hat matrix is the sum of its eigenvalues. If your matrix `X` has less columns than rows and is of full column rank, then this trace will amount to exactly **the number of columns** in `X`. In any other case, and e.g. using the ridge penalty, use the singular value decomposition of `X = U S V.T` to establish the trace as a function of `S`. If this doesn't resolve your issue, it would be great if you could post some fully copy+pasteable code exemplifying what you are doing. — eickenberg, May 29 '14 at 10:45
For GCV you need the vector of diagonal elements of the hatmatrix which can be calculated without calculating the entire (n, n) hatmatrix. — Josef, May 29 '14 at 12:16
@eickenberg that was helpful. Am I understanding correctly that via singular value decomposition that the trace of the hat matrix would be equal to the sum of the squares of s_i, where s_i is the i_th diagonal of S from the SVD of X? — MNagy, May 29 '14 at 16:30
Almost - use `X=USV.T` and write out the whole hat matrix, with the inverse of `X.T.dot(X)` in the middle. All diagonal elements will be of the form `s_i ** 2 / s_i ** 2 == 1`. Also, @user333700 correctly states that you need the whole diagonal to compute GCV (more on that e.g. [here](http://dspace.mit.edu/handle/1721.1/37318)). — eickenberg, May 29 '14 at 16:36
BTW if you are looking for an implementation of this, you can find one in [`sklearn.linear_model.RidgeCV`](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L593). It performs GCV among other possibilities. — eickenberg, May 29 '14 at 16:38
@eickenberg I may end up resorting to RidgeCV in the future. I tried to go by SVD, but it's giving me the same memory error. Is there a handy formula for calculating h_ii? — MNagy, May 29 '14 at 23:17
Please tell us how many columns your matrix has, it is important to be able to answer. As for SVD, use the kwarg `full_matrices=False`, otherwise you will generate a 15k x 15k orthogonal matrix which is not necessary. Again, if you only need the trace of the hat matrix, it is exactly `n_columns` if your matrix is of full rank. If you need the diagonal, then do `(X.T * np.linalg.inv(X.T.dot(X)).dot(X.T)).sum(0)` — eickenberg, May 30 '14 at 07:43

score 2 · Answer 1 · edited Feb 20 '19 at 22:46

For the hat matrix you show, hat = X.dot(np.linalg.inv(X.T.dot(X)).dot(X.T)), the trace is simply rank(X), so if X has full column rank and less columns than rows, then this is X.shape[1]. What you have found here is the degrees of freedom of the unbiased linear estimator - Ordinary least squares.

Your intention to perform GCV, however, necessitates the full knowledge of the diagonal of the hat matrix. You can calculate this without resorting to calculating the full hat matrix as follows:

import numpy as np
rng = np.random.RandomState(42)
n_samples, n_features = 20, 5
X = rng.randn(n_samples, n_features)

hat = X.dot(np.linalg.inv(X.T.dot(X)).dot(X.T))

hat_diag = np.diagonal(hat)
trace = hat_diag.sum()  # this is equal to n_features == 5
print trace

only_diag = np.einsum('ij, ij -> j', X.T, np.linalg.inv(X.T.dot(X)).dot(X.T))

print (only_diag == hat_diag).all()  # this evaluates to True

As can be seen, only_diag contains the diagonal. You can also, probably with a higher memory footprint, compute it the simple way by doing

only_diag = (X.T * np.linalg.inv(X.T.dot(X)).dot(X.T)).sum(0)

`only_diag = np.einsum('ij, jk, ik -> i', X, np.linalg.inv(X.T.dot(X)), X)` will also work and avoid calculating `np.linalg.inv(X.T.dot(X)).dot(X.T)` which is of the same size as `X`, explicitly. — eickenberg, May 30 '14 at 13:55
Another shortcut: `np.linalg.inv(X.T.dot(X)).dot(X.T) == np.linalg.pinv(X)` — Yike Lu, Feb 19 '15 at 17:31
Definitely. And it is even slightly more general, since it works for non column-rank deficient matrices as well. — eickenberg, Feb 19 '15 at 21:32

Computing the trace of a hat matrix from and independent variable matrix with a large number of rows; how can I avoid memory errors?

1 Answers1