Recursive symbolic calculations - improve the performance

Question

In my research I'm trying to tackle the Kolmogorov backward equation, i.e. interested in $$Af = b(x)f'(x)+\sigma(x)f''(x)$$

With the specific b(x) and \sigma(x), I'm trying to see how fast the coefficients of the expression are growing when calculating higher Af powers. I'm struggle to derive this analytically thus tried to see the trend empirically.

First, I have used sympy:

from sympy import *
import matplotlib.pyplot as plt
import re
import math
import numpy as np
import time
np.set_printoptions(suppress=True)

x = Symbol('x')
b = Function('b')(x)
g = Function('g')(x)

def new_coef(gamma, beta, coef_minus2, coef_minus1, coef):
    return expand(simplify(gamma*coef_minus2 + beta*coef_minus1 + 2*gamma*coef_minus1.diff(x)\
            +beta*coef.diff(x)+gamma*coef.diff(x,2)))
def new_coef_first(gamma, beta, coef):
    return expand(simplify(beta*coef.diff(x)+gamma*coef.diff(x,2)))
def new_coef_second(gamma, beta, coef_minus1, coef):
    return expand(simplify(beta*coef_minus1 + 2*gamma*coef_minus1.diff(x)\
            +beta*coef.diff(x)+gamma*coef.diff(x,2)))
def new_coef_last(gamma, beta, coef_minus2):
    return lambda x: gamma(x)*coef_minus2(x)
def new_coef_last(gamma, beta, coef_minus2):
    return expand(simplify(gamma*coef_minus2 ))
def new_coef_second_to_last(gamma, beta, coef_minus2, coef_minus1):
    return expand(simplify(gamma*coef_minus2 + beta*coef_minus1 + 2*gamma*coef_minus1.diff(x)))

def set_to_zero(expression):
    expression = expression.subs(Derivative(b, x, x, x), 0)
    expression = expression.subs(Derivative(b, x, x), 0)
    expression = expression.subs(Derivative(g, x, x, x, x), 0)
    expression = expression.subs(Derivative(g, x, x, x), 0)
    return expression

def sum_of_coef(expression):
    sum_of_coef = 0
    for i in str(expression).split(' + '):
        if i[0:1] == '(':
            i = i[1:]
        integers = re.findall(r'\b\d+\b', i)
        if len(integers) > 0:
            length_int = len(integers[0])
            if i[0:length_int] == integers[0]:
                sum_of_coef += int(integers[0])
            else:
                sum_of_coef += 1
        else:
            sum_of_coef += 1
    return sum_of_coef
power = 6
charar = np.zeros((power, power*2), dtype=Symbol)
coef_sum_array = np.zeros((power, power*2))
charar[0,0] = b
charar[0,1] = g
coef_sum_array[0,0] = 1
coef_sum_array[0,1] = 1
for i in range(1, power):
    #print(i)
    for j in range(0, (i+1)*2):
        #print(j, ':')
        #start_time = time.time()
        if j == 0:
            charar[i,j] = set_to_zero(new_coef_first(g, b, charar[i-1, j]))
        elif j == 1:
            charar[i,j] = set_to_zero(new_coef_second(g, b, charar[i-1, j-1], charar[i-1, j]))
        elif j == (i+1)*2-2:
            charar[i,j] = set_to_zero(new_coef_second_to_last(g, b, charar[i-1, j-2], charar[i-1, j-1]))
        elif j == (i+1)*2-1:
            charar[i,j] = set_to_zero(new_coef_last(g, b, charar[i-1, j-2]))
        else:
            charar[i,j] = set_to_zero(new_coef(g, b, charar[i-1, j-2], charar[i-1, j-1], charar[i-1, j]))
        #print("--- %s seconds for expression---" % (time.time() - start_time))
        #start_time = time.time()
        coef_sum_array[i,j] = sum_of_coef(charar[i,j])
        #print("--- %s seconds for coeffiecients---" % (time.time() - start_time))
coef_sum_array

Then, looked into automated differentiation and used autograd:

import autograd.numpy as np
from autograd import grad
import time
np.set_printoptions(suppress=True)

b = lambda x: 1 + x
g = lambda x: 1 + x + x**2

def new_coef(gamma, beta, coef_minus2, coef_minus1, coef):
    return lambda x: gamma(x)*coef_minus2(x) + beta(x)*coef_minus1(x) + 2*gamma(x)*grad(coef_minus1)(x)\
            +beta(x)*grad(coef)(x)+gamma(x)*grad(grad(coef))(x)
def new_coef_first(gamma, beta, coef):
    return lambda x: beta(x)*grad(coef)(x)+gamma(x)*grad(grad(coef))(x)
def new_coef_second(gamma, beta, coef_minus1, coef):
    return lambda x: beta(x)*coef_minus1(x) + 2*gamma(x)*grad(coef_minus1)(x)\
            +beta(x)*grad(coef)(x)+gamma(x)*grad(grad(coef))(x)
def new_coef_last(gamma, beta, coef_minus2):
    return lambda x: gamma(x)*coef_minus2(x)
def new_coef_second_to_last(gamma, beta, coef_minus2, coef_minus1):
    return lambda x: gamma(x)*coef_minus2(x) + beta(x)*coef_minus1(x) + 2*gamma(x)*grad(coef_minus1)(x)

power = 6
coef_sum_array = np.zeros((power, power*2))
coef_sum_array[0,0] = b(1.0)
coef_sum_array[0,1] = g(1.0)
charar = [b, g]
for i in range(1, power):
    print(i)
    charar_new = []
    for j in range(0, (i+1)*2):
        if j == 0:
            new_funct = new_coef_first(g, b, charar[j])
        elif j == 1:
            new_funct = new_coef_second(g, b, charar[j-1], charar[j])
        elif j == (i+1)*2-2:
            new_funct = new_coef_second_to_last(g, b, charar[j-2], charar[j-1])
        elif j == (i+1)*2-1:
            new_funct = new_coef_last(g, b, charar[j-2])
        else:
            new_funct = new_coef(g, b, charar[j-2], charar[j-1], charar[j])
        coef_sum_array[i,j] = new_funct(1.0)
        charar_new.append(new_funct)
    charar = charar_new
coef_sum_array

However, I'm so not happy with the speed of either of them. I would like to do at least thousand iterations while after 3 days of running simpy method, I got 30 :/

I expect that the second method (numerical) could be optimized to avoid recalculating expressions every time. Unfortunately, I cannot see that solution myself. Also, I have tried Maple, but again without luck.

wsdookadr · Accepted Answer · 2021-01-18T14:19:22.923

Overview

So, there are two formulas about derivatives that are interesting here:

Faa di Bruno's formula which is a way to quickly find the n-th derivative of f(g(x)) , and looks a lot like the Multinomial theorem
The General Leibniz rule which is a way to quickly find the n-th derivative of f(x)*g(x) and looks a lot like the Binomial theorem

Both of these have been discussed in pull request #13892 the n-th derivative was sped up using the general Leibniz rule.

I'm trying to see how fast the coefficients of the expression are growing

In your code, the general formula for computing c[i][j] is this:

c[i][j] = g * c[i-1][j-2] + b * c[i-1][j-1] + 2 * g * c'[i-1][j-1] + g * c''[i-1][j]

(where by c'[i][j] and c''[i][j] are the 1st and 2nd derivatives of c[i][j])

Because of this, and by the Leibniz rule mentioned above, I think intuitively, the coefficients computed should be related to Pascal's triangle (or at the very least they should have some combinatorial relation).

Optimization #1

In the original code, the function sum_to_coef(f) serializes the expression f to a string and then discarding everything that doesn't look like a number, and then sums the remaining numbers.

We can avoid serialization here by just traversing the expression tree and collecting what we need

def sum_of_coef(f):
    s = 0
    if f.func == Add:
        for sum_term in f.args:
            res = sum_term if sum_term.is_Number else 1
            if len(sum_term.args) == 0:
                s += res
                continue
            first = sum_term.args[0]
            if first.is_Number == True:
                res = first
            else:
                res = 1
            s += res
    elif f.func == Mul:
        first = f.args[0]
        if first.is_Number == True:
            s = first
        else:
            s = 1
    elif f.func == Pow:
        s = 1
    return s

Optimization #2

In the function set_to_zero(expr) all the 2nd and 3rd derivatives of b, and the 3rd and 4th derivatives of g are replaced by zero.

We can collapse all those substitutions into one statement like so:

b3,b2=b.diff(x,3),b.diff(x,2)
g4,g3=g.diff(x,4),g.diff(x,3)

def set_to_zero(expression):
    expression = expression.subs({b3:0,b2:0,g4:0,g3:0})
    return expression

Optimization #3

In the original code, for every cell c[i][j] we're calling simplify. This turns out to have a big impact on performance but actually we can skip this call, because fortunately our expressions are just sums of products of derivatives or unknown functions.

So the line

charar[i,j] = set_to_zero(expand(simplify(expr)))

becomes

charar[i,j] = set_to_zero(expand(expr))

Optimization #4

The following was also tried but turned out to have very little impact.

For two consecutive values of j, we're computing c'[i-1][j-1] twice.

j-1       c[i-1][j-3] c[i-1][j-2] c[i-1][j-1]
  j                   c[i-1][j-2] c[i-1][j-1] c[i-1][j]

If you look at the loop formula in the else branch, you see that c'[i-1][j-1] has already been computed. It can be cached, but this optimization has little effect in the SymPy version of the code.

Here it's also important to mention that it's possible to visualize the call tree of SymPy involved in computing these derivatives. It's actually larger, but here is part of it:

We can also generate a flamegraph using the py-spy module just to see where time is being spent:

As far as I could tell, 34% of the time spent in _eval_derivative_n_times , 10% of the time spent in the function getit from assumptions.py , 12% of the time spent in subs(..) , 12% of the time spent in expand(..)

Optimization #5

Apparently when pull request #13892 was merged into SymPy, it also introduced a performance regression.

In one of the comments regarding that regression, Ondrej Certik recommends using SymEngine to improve performance of code that makes heavy use of derivatives.

So I've ported the code mentioned to SymEngine.py and noticed that it runs 98 times faster than the SymPy version for power=8 ( and 4320 times faster for power=30)

The required module can be installed via pip3 install --user symengine.

#!/usr/bin/python3
from symengine import *
import pprint
x=var("x")
b=Function("b")(x)
g=Function("g")(x)

b3,b2=b.diff(x,3),b.diff(x,2)
g4,g3=g.diff(x,4),g.diff(x,3)

def set_to_zero(e):
    e = e.subs({b3:0,b2:0,g4:0,g3:0})
    return e

def sum_of_coef(f):
    s = 0
    if f.func == Add:
        for sum_term in f.args:
            res = 1
            if len(sum_term.args) == 0:
                s += res
                continue
            first = sum_term.args[0]
            if first.is_Number == True:
                res = first
            else:
                res = 1
            s += res
    elif f.func == Mul:
        first = f.args[0]
        if first.is_Number == True:
            s = first
        else:
            s = 1
    elif f.func == Pow:
        s = 1
    return s

def main():
    power = 8
    charar = [[0] * (power*2) for x in range(power)]
    coef_sum_array = [[0] * (power*2) for x in range(power)]
    charar[0][0] = b
    charar[0][1] = g
    init_printing()
    for i in range(1, power):
        jmax = (i+1)*2
        for j in range(0, jmax):
            c2,c1,c0 = charar[i-1][j-2],charar[i-1][j-1],charar[i-1][j]
            #print(c2,c1,c0)
            if   j == 0:
                expr =                                b*c0.diff(x) + g*c0.diff(x,2)
            elif j == 1:
                expr =        b*c1 + 2*g*c1.diff(x) + b*c0.diff(x) + g*c0.diff(x,2)
            elif j == jmax-2:
                expr = g*c2 + b*c1 + 2*g*c1.diff(x)
            elif j == jmax-1:
                expr = g*c2
            else:
                expr = g*c2 + b*c1 + 2*g*c1.diff(x) + b*c0.diff(x) + g*c0.diff(x,2)
            charar[i][j] = set_to_zero(expand(expr))
            coef_sum_array[i][j] = sum_of_coef(charar[i][j])

    pprint.pprint(Matrix(coef_sum_array))

main()

Performance after optimization #5

I think it would be very interesting to look at the number of terms in c[i][j] to determine how quickly the expressions are growing. That would definitely help in estimating the complexity of the current code.

But for practical purposes I've plotted the current time and memory consumption of the SymEngine code above and managed to get the following chart:

performance_modif6

Both the time and the memory seem to be growing polynomially with the input (the power parameter in the original code).

The same chart but as a log-log plot can be viewed here:

performance_modif6_loglog

Like the wiki page says, a straight line on a log-log plot corresponds to a monomial. This offers a way to recover the exponent of the monomial.

So if we consider two points N=16 and N=32 between which the log-log plot looks like a straight line

import pandas as pd
df=pd.read_csv("modif6_bench.txt", sep=',',header=0)

def find_slope(col1,col2,i1,i2):
    xData = df[col1].to_numpy()
    yData = df[col2].to_numpy()
    
    x0,x1 = xData[i1],xData[i2]
    y0,y1 = yData[i1],yData[i2]
    
    m = log(y1/y0)/log(x1/x0)
    return m

print("time slope = {0:0.2f}".format(find_slope("N","time",16,32)))
print("memory slope = {0:0.2f}".format(find_slope("N","memory",16,32)))

Output:

time slope = 5.69
memory slope = 2.62

So very rough approximation of time complexity would be O(n^5.69) and an approximation of space complexity would be O(2^2.62).

There are more details about deciding whether the growth rate is polynomial or exponential here (it involves drawing a semi-log and a log-log plot, and seeing where the data shows up as a straight line).

Performance with defined `b` and `g` functions

In the first original code block, the functions b and g were undefined functions. This means SymPy and SymEngine didn't know anything about them.

The 2nd original code block defines b=1+x and g=1+x+x**2 . If we run all of this again with known b and g the code runs much faster, and the running time curve and the memory usage curves are better than with unknown functions

time slope = 2.95
memory slope = 1.35

Recorded data fitting onto known growth-rates

I wanted to look a bit more into matching the observed resource consumption(time and memory), so I wrote the following Python module that fits each growth rate (from a catalog of common such growth rates) to the recorded data, and then shows the plot to the user.

It can be installed via pip3 install --user matchgrowth

When run like this:

match-growth.py --infile ./tests/modif7_bench.txt --outfile time.png --col1 N --col2 time --top 1

It produces graphs of the resource usage, as well as the closest growth rates it matches to. In this case, it finds the polynomial growth to be closest:

Other notes

If you run this for power=8 (in the symengine code mentioned above) the coefficients will look like this:

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[1, 5, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[1, 17, 40, 31, 9, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[1, 53, 292, 487, 330, 106, 16, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[1, 161, 1912, 6091, 7677, 4693, 1520, 270, 25, 1, 0, 0, 0, 0, 0, 0]
[1, 485, 11956, 68719, 147522, 150706, 83088, 26573, 5075, 575, 36, 1, 0, 0, 0, 0]
[1, 1457, 73192, 735499, 2568381, 4118677, 3528928, 1772038, 550620, 108948, 13776, 1085, 49, 1, 0, 0]
[1, 4373, 443524, 7649215, 42276402, 102638002, 130209104, 96143469, 44255170, 13270378, 2658264, 358890, 32340, 1876, 64, 1]

So as it turns out, the 2nd column coincides with A048473 which according to OEIS is "The number of triangles (of all sizes, including holes) in Sierpiński's triangle after n inscriptions".

All the code for this is also available in this repo.

+1 for bringing OEIS into the picture, always a nice touch. I'd give you another +1 for a general Hall of Fame level of effort but alas, I have but one upvote to give. Keep up the good work is all I can say. — Robert Dodier, Jan 16 '21 at 02:38
Huge thanks for looking into this, @wsdookadr :) what's your view which one to choose: symbolic calculations or numerical for this specific issue? — Gabrielė Mongirdaitė, Jan 16 '21 at 18:13
@GabrielėMongirdaitė I've added some more notes above about performance. I've also used your initial `b=1+x` and `g=1+x+x**2` to see how the running time and memory usage behave. I've seen that usually numerical methods are used when it's too expensive to use symbolic ones (and there is a precision penalty for numeric methods). But if you define your functions `b` and `g` the running time is not so bad (see above). I think it also depends a lot on what the real `b` and `g` functions are in your case. Are they actually `1+x` and `1+x+x**2` or something different ? — wsdookadr, Jan 17 '21 at 03:47
@GabrielėMongirdaitė can I ask if the slope computation above is correct ? (for some definition of "correct"). Is this typically how it's done in practice? (I'm new to that part so this is why I'm asking) Thanks! — wsdookadr, Jan 17 '21 at 03:49
@wsdookadr, the overall idea is to check if the sums of coefficients divided by respective factorial will start to converge to 0. E.g. taking first 150 with specific `b` and `g` functions we still see increasing trend. Answering your question, at this moment it is enough to work with `1+x` and `1+x+x**2`. And to my view, the slope computation is correct :) — Gabrielė Mongirdaitė, Jan 17 '21 at 12:22
division of respective factorial, I have in mind `values.append(sum(coef_sum_array[i])) values.insert(0, 2) answer = [] for i in range(1, power): answer.append(values[i - 1] / math.factorial(i)) plt.scatter(range(1, power), answer, marker='.', color='black') plt.show()` — Gabrielė Mongirdaitė, Jan 17 '21 at 12:24

score 1 · Answer 2 · answered Jan 23 '21 at 12:27

Relations between polynomial coefficients from the i-th line with coefficients from the (i-1)-th line

In the previous post c[i][j] was calculated. It's possible to check that deg(c[i][j])=j+1 .

This can be checked by initializing a separate 2d array, and computing the degree like so:

deg[i][j] = degree(poly(parse_expr(str(charar[i][j]))))

Vertical formulas:

Then if we denote by u(i,j,k) the coefficient of x^k in c[i][j] , we can try to find formulas for u(i,j,k) in terms of u(i-1,_,_). Formulas for u(i,j,_) will be the same as formulas for u(i+1,j,_) (and all following rows), so there's some oportunity there for caching.

Horizontal formulas:

It's also interesting that when we fix i, and find that the formulas for u(i,j,_) look the same as they do for u(i,j+1,_) except for the last 3 values of k. But I'm not sure if this can be leveraged.

The caching steps mentioned above might help to skip unnecessary computations.

See more about this here.

Some notes about analytical, closed-form solutions and asymptotics

I'm struggling to derive this analytically

Yes, this seems to be hard. The closest class of recursive sequences related to the one mentioned here are called Holonomic sequences (also called D-finite or P-recursive). The sequence c[i][j] is not C-finite because it has polynomial coefficients (in the general case even the asymptotics of recurrences with polynomial coefficients is an open problem).

However, the recurrence relation for c[i][j] does not qualify for this because of the derivatives. If we were to leave out the derivatives in the formula of c[i][j] then it would qualify as a Holonomic sequence. Here are some places where I found solutions for these:

"The Concrete Tetrahedron: Symbolic Sums, Recurrence Equations, Generating Functions, Asymptotic Estimates" by Kauers and Paule - Chapter 7 Holonomic Sequences and Power Series
Analytic Combinatorics by Flajolet and Sedgewick - Appendix B.4 Holonomic Functions

But also c[i][j] is a several variable recurrence, so that's another reason why it doesn't fit into that theory mentioned above.

There is however another book called Analytic Combinatorics in Several Variables by Robin Pemantle and Mark C. Wilson which does handle several variable recurrences.

All the books mentioned above require a lot of complex analysis, and they go much beyond the little math that I currently know, so hopefully someone with a more solid understanding of that kind of math can try this out.

The most advanced CAS that has generating-function-related operations and can operate on this kind of sequences is Maple and the gfun package gfun repo (which for now only handles the univariate case).