Possible optimization in cython: numpy array

Question

The following is my Cython code for drawing from multivariate normal distribution. I am using loop because each time I have different density. (conLSigma is the Cholesky factor)

This is taking a lot of time because I am taking inverse and Cholesky decomposition for each loop. It is faster than pure python code, but I was wondering if there is any way I can boost the speed more.

from __future__ import division

import numpy as np 

cimport numpy as np 

ctypedef np.float64_t dtype_t

cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)

def drawMetro(np.ndarray[dtype_t, ndim = 2] beta,
              np.ndarray[dtype_t, ndim = 3] H,
              np.ndarray[dtype_t, ndim = 2] Sigma,
              float s):

    cdef int ncons = betas.shape[0]
    cdef int nX = betas.shape[1]
    cdef int con

    cdef np.ndarray betas_cand = np.zeros([ncons, nX], dtype = np.float64)
    cdef np.ndarray conLSigma = np.zeros([nX, nX], dtype = np.float64)

    for con in xrange(ncons):
        conLSigma = np.linalg.cholesky(np.linalg.inv(H[con] + Sigma))
        betas_cand[con] = betas[con] + s * np.dot(conLSigma, np.random.standard_normal(size = nX))

    return(betas_cand)

score 5 · Answer 1 · answered Nov 21 '10 at 06:36

The Cholesky decomposition creates a lower-triangular matrix. This means that close to half of the multiplications done in the np.dot don't need to be done. If you change the line

betas_cand[con] = betas[con] + s * np.dot(conLSigma, np.random.standard_normal(size = nX))

into

tmp = np.random.standard_normal(size = nX)
for i in xrange(nX):
    for j in xrange(i+1):
        betas_cand[con,i] += s * conLSigma[i,j] * tmp[j]

However, you'll also need to change

cdef np.ndarray betas_cand = np.zeros([ncons, nX], dtype = np.float64)

into

cdef np.ndarray betas_cand = np.array(betas)

You could of course use slices for the multiplications, but I'm not sure if it will be faster than the way that I've suggested. Anyway, hopefully you get the idea. I don't think that there is much else that you can do to speed this up.

Thanks for the tip. I was wondering if instead of using np.dot stuff, directly calling C libraries such as BLAS would help. — joon, Nov 21 '10 at 17:46

score 0 · Answer 2 · answered Jun 09 '14 at 15:03

0

what about computing the cholesky decomposition first and inverting the lower triangular matrix after by back substitution. This should be faster than linalg.cholesky(linalg.inv(S)).

answered Jun 09 '14 at 15:03

Axel

1

Possible optimization in cython: numpy array

2 Answers2