How much space does ridge regression require?

Question

In Haskell, ridge regression can be expressed as:

import Numeric.LinearAlgebra 

createReadout :: Matrix Double → Matrix Double → Matrix Double
createReadout a b = oA <\> oB
  where
   μ = 1e-4

   oA = (a <> (tr a)) + (μ * (ident $ rows a))
   oB = a <> (tr b)

However, this operation is very memory expensive. Here is a minimalistic example that requires more than 2GB on my machine and takes 3 minutes to execute.

import Numeric.LinearAlgebra
import System.Random

createReadout :: Matrix Double -> Matrix Double -> Matrix Double
createReadout a b = oA <\> oB
  where
    mu = 1e-4
    oA = (a <> (tr a)) + (mu * (ident $ rows a))
    oB = a <> (tr b)

teacher :: [Int] -> Int -> Int -> Matrix Double
teacher labelsList cols' correctRow = fromBlocks $ f <$> labelsList
  where ones = konst 1.0 (1, cols')
        zeros = konst 0.0 (1, cols')
        rows' = length labelsList
        f i | i == correctRow = [ones]
            | otherwise = [zeros]

glue :: Element t => [Matrix t] -> Matrix t
glue xs = fromBlocks [xs]

main :: IO ()
main = do

  let n = 1500  -- <- The constant to be increased
      m = 10000
      cols' = 12

  g <- newStdGen

  -- Stub data
  let labels = take m . map (`mod` 10) . randoms $ g :: [Int]
      a = (n >< (cols' * m)) $ take (cols' * m * n) $ randoms g :: Matrix Double
      teachers = zipWith (teacher [0..9]) (repeat cols') labels
      b = glue teachers

  print $ maxElement $ createReadout a b
  return ()

$ cabal exec ghc -- -O2 Test.hs

$ time ./Test
./Test 190.16s user 5.22s system 106% cpu 3:03.93 total

The problem is to increase the constant n, at least to n = 4000, while RAM is limited by 5GB. What is minimal space that matrix inversion operation requires in theory? How can this operation be optimized in terms of space? Can ridge regression be efficiently replaced with a cheaper method?

Are the matrices [sparse](https://en.wikipedia.org/wiki/Sparse_matrix)? That could save you a heck lot of space and time (but you'd probably need a dedicated algorithm like [conjugate gradient](https://en.wikipedia.org/wiki/Conjugate_gradient_method)). — leftaroundabout, Dec 03 '16 at 16:14

score 1 · Accepted Answer · answered Dec 03 '16 at 15:51

Simple Gauss-Jordan elimination only takes space to store the input and output matrices plus constant auxiliary space. If I'm reading correctly, the matrix oA you need to invert is n x n so that's not a problem.

Your memory usage is completely dominated by storing the input matrix a, which uses at least 1500 * 120000 * 8 = 1.34 GB. n = 4000 would be 4000 * 120000 * 8 = 3.58 GB which is over half of your space budget. I don't know what matrix library you are using or how it stores its matrices, but if they are on the Haskell heap then GC effects could easily account for another factor of 2 in space usage.

Reid, thank you for your answer. Indeed, I have to mention that I am using hmatrix library that is interfacing with C routines (BLAS and LAPACK). Matrices are then stored as Data.Vector.Storable arrays. — penkovsky, Dec 03 '16 at 15:57

score 1 · Answer 2 · answered Dec 06 '16 at 18:04

Well you can get away with 3*m + nxn space, but how numerically stable this will be I'm not sure.

The basis is the identity

inv( inv(Q) + A'*A)) = Q - Q*A'*R*A*Q
where R = inv( I + A*Q*A')

If A is your A matrix and

Q = inv( mu*I*mu*I) = I/(mu*mu)

then the solution to your ridge regression is

inv( inv(Q) + A'*A)) * A'*b

A little more algebra shows

inv( inv(Q) + A'*A)) = (I - A'*inv( (mu2 + A*A'))*A)/mu2
where mu2 = mu*m

Note that since A is n x m, A*A' is n x n.

So one algorithm would be

Compute C = A*A' + mu2

Do a cholesky decompostion of C, ie find upper triangular U so that U'*U = C

Compute the vector y = A'*b

Compute the vector z = A*y

Solve U'*u = z for u in z

Solve U*v = z for v in z

compute w = A'*z

Compute x = (y - w)/mu2.

How much space does ridge regression require?

2 Answers2