2

A Python program I am currently working on (Gaussian process classification) is bottlenecking on evaluation of Sympy symbolic matrices, and I can't figure out what I can, if anything, do to speed it up. Other parts of the program I've already ensured are typed properly (in terms of numpy arrays) so calculations between them are properly vectorised, etc.

I looked into Sympy's codegen functions a bit (autowrap, binary_function) in particular, but because my within my ImmutableMatrix object itself are partial derivatives over elements of a symbolic matrix, there is a long list of 'unhashable' things which prevent me from using the codegen functionality.

Another possibility I looked into was using Theano - but after some initial benchmarks, I found that while it build the initial partial derivative symbolic matrices much quicker, it seemed to be a few orders of magnitude slower at evaluation, the opposite of what I was seeking.

Below is a working, extracted snippet of the code I am currently working on.

import theano
import sympy
from sympy.utilities.autowrap import autowrap
from sympy.utilities.autowrap import binary_function
import numpy as np
import math
from datetime import datetime

# 'Vectorized' cdist that can handle symbols/arbitrary types - preliminary benchmarking put it at ~15 times faster than python list comprehension, but still notably slower (forgot at the moment) than cdist, of course
def sqeucl_dist(x, xs):
    m = np.sum(np.power(
        np.repeat(x[:,None,:], len(xs), axis=1) -
        np.resize(xs, (len(x), xs.shape[0], xs.shape[1])),
        2), axis=2)
    return m


def build_symbolic_derivatives(X):
    # Pre-calculate derivatives of inverted matrix to substitute values in the Squared Exponential NLL gradient
    f_err_sym, n_err_sym = sympy.symbols("f_err, n_err")

    # (1,n) shape 'matrix' (vector) of length scales for each dimension
    l_scale_sym = sympy.MatrixSymbol('l', 1, X.shape[1])

    # K matrix
    print("Building sympy matrix...")
    eucl_dist_m = sqeucl_dist(X/l_scale_sym, X/l_scale_sym)
    m = sympy.Matrix(f_err_sym**2 * math.e**(-0.5 * eucl_dist_m) 
                     + n_err_sym**2 * np.identity(len(X)))


    # Element-wise derivative of K matrix over each of the hyperparameters
    print("Getting partial derivatives over all hyperparameters...")
    pd_t1 = datetime.now()
    dK_df   = m.diff(f_err_sym)
    dK_dls  = [m.diff(l_scale_sym) for l_scale_sym in l_scale_sym]
    dK_dn   = m.diff(n_err_sym)
    print("Took: {}".format(datetime.now() - pd_t1))

    # Lambdify each of the dK/dts to speed up substitutions per optimization iteration
    print("Lambdifying ")
    l_t1 = datetime.now()
    dK_dthetas = [dK_df] + dK_dls + [dK_dn]
    dK_dthetas = sympy.lambdify((f_err_sym, l_scale_sym, n_err_sym), dK_dthetas, 'numpy')
    print("Took: {}".format(datetime.now() - l_t1))
    return dK_dthetas


# Evaluates each dK_dtheta pre-calculated symbolic lambda with current iteration's hyperparameters
def eval_dK_dthetas(dK_dthetas_raw, f_err, l_scales, n_err):
    l_scales = sympy.Matrix(l_scales.reshape(1, len(l_scales)))
    return np.array(dK_dthetas_raw(f_err, l_scales, n_err), dtype=np.float64)


dimensions = 3 
X = np.random.rand(50, dimensions)
dK_dthetas_raw = build_symbolic_derivatives(X)

f_err = np.random.rand()
l_scales = np.random.rand(3)
n_err = np.random.rand()

t1 = datetime.now()
dK_dthetas = eval_dK_dthetas(dK_dthetas_raw, f_err, l_scales, n_err) # ~99.7%
print(datetime.now() - t1) 

In this example, 5 50x50 symbolic matrices are evaluated, i.e. only 12,500 elements, taking 7 seconds. I've spent quite some time looking for resources on speeding operations like this up, and trying to translate it into Theano (at least until I found its evaluation slower in my case) and having no luck there either.

Any help greatly appreciated!

mediantis
  • 145
  • 3
  • 14
  • 1
    Have you looked into using SymEngine to speed up the SymPy operations?, EDIT: it can be installed using ``conda install -c conda-forge python-symengine`` – Bjoern Dahlgren Jul 30 '16 at 07:37
  • Thanks for the tip @BjoernDahlgren - I'll look into this. Still currently playing around with it, but using autowrap has improved the speed ~40-fold, but the autowrap process is instead taking up a lot of the time (but still a huge improvement compared to not doing it). – mediantis Jul 30 '16 at 10:37
  • A problem with your code is partly due to the fact that the generated function operates on a SymPy matrix (created with `sympy.Matrix`) rather than a numpy matrix. Does it help if you replace `sympy.Matrix` with `np.matrix`? – ken Mar 08 '17 at 04:03

0 Answers0