6

I need to evaluate b-splines in python. To do so i wrote the code below which works very well.

import numpy as np
import scipy.interpolate as si

def scipy_bspline(cv,n,degree):
    """ bspline basis function
        c        = list of control points.
        n        = number of points on the curve.
        degree   = curve degree
    """
    # Create a range of u values
    c = cv.shape[0]
    kv = np.clip(np.arange(c+degree+1)-degree,0,c-degree)
    u  = np.linspace(0,c-degree,n)
    
    # Calculate result
    return np.array(si.splev(u, (kv,cv.T,degree))).T

Giving it 6 control points and asking it to evaluate 100k points on curve is a breeze:

# Control points
cv = np.array([[ 50.,  25., 0.],
       [ 59.,  12., 0.],
       [ 50.,  10., 0.],
       [ 57.,   2., 0.],
       [ 40.,   4., 0.],
       [ 40.,   14., 0.]])

n = 100000  # 100k Points
degree = 3 # Curve degree
points_scipy = scipy_bspline(cv,n,degree) #cProfile clocks this at 0.012 seconds

Here's a plot of "points_scipy": enter image description here

Now, to make this faster i could compute a basis for all 100k points on the curve, store that in memory, and when i need to draw a curve all i would need to do is multiply the new control point positions with the stored basis to get the new curve. To prove my point i wrote a function that uses DeBoor's algorithm to compute my basis:

def basis(c, n, degree):
    """ bspline basis function
        c        = number of control points.
        n        = number of points on the curve.
        degree   = curve degree
    """
    # Create knot vector and a range of samples on the curve
    kv = np.array([0]*degree + range(c-degree+1) + [c-degree]*degree,dtype='int') # knot vector
    u  = np.linspace(0,c-degree,n) # samples range
    
    # Cox - DeBoor recursive function to calculate basis
    def coxDeBoor(u, k, d):
        # Test for end conditions
        if (d == 0):
            if (kv[k] <= u and u < kv[k+1]):
                return 1
            return 0
        
        Den1 = kv[k+d] - kv[k]
        Den2 = 0
        Eq1  = 0
        Eq2  = 0
        
        if Den1 > 0:
            Eq1 = ((u-kv[k]) / Den1) * coxDeBoor(u,k,(d-1))
            
        try:
            Den2 = kv[k+d+1] - kv[k+1]
            if Den2 > 0:
                Eq2 = ((kv[k+d+1]-u) / Den2) * coxDeBoor(u,(k+1),(d-1))
        except:
            pass
        
        return Eq1 + Eq2
    

    # Compute basis for each point
    b = np.zeros((n,c))
    for i in xrange(n):
        for k in xrange(c):
            b[i][k%c] += coxDeBoor(u[i],k,degree)

    b[n-1][-1] = 1
                
    return b

Now lets use this to compute a new basis, multiply that by the control points and confirm that we're getting the same results as splev:

b = basis(len(cv),n,degree) #5600011 function calls (600011 primitive calls) in 10.975 seconds
points_basis = np.dot(b,cv) #3 function calls in 0.002 seconds
print np.allclose(points_basis,points_scipy) # Returns True

My extremely slow function returned 100k basis values in 11 seconds, but since these values only need to be computed once, calculating the points on curve ended up being 6 times faster this way than doing it via splev.

The fact that I was able to get the exact same results from both my method and splev leads me to believe that internally splev probably calculates a basis like i do, except much faster.

So my goal is to find out how to calculate my basis fast, store it in memory and just use np.dot() to compute the new points on curve, and my question is: Is it possible to use spicy.interpolate to get the basis values that (i presume) splev uses to compute it's result? And if so, how?

[ADDENDUM]

Following unutbu's and ev-br's very useful insight on how scipy calculates a spline's basis, I looked up the fortran code and wrote a equivalent to the best of my abilities:

def fitpack_basis(c, n=100, d=3, rMinOffset=0, rMaxOffset=0):
    """ fitpack's spline basis function
        c = number of control points.
        n = number of points on the curve.
        d = curve degree
    """
    # Create knot vector
    kv = np.array([0]*d + range(c-d+1) + [c-d]*d, dtype='int')

    # Create sample range
    u = np.linspace(rMinOffset, rMaxOffset + c - d, n)  # samples range
    
    # Create buffers
    b  = np.zeros((n,c)) # basis
    bb = np.zeros((n,c)) # basis buffer
    left  = np.clip(np.floor(u),0,c-d-1).astype(int)   # left  knot vector indices
    right = left+d+1 # right knot vector indices

    # Go!
    nrange = np.arange(n)
    b[nrange,left] = 1.0
    for j in xrange(1, d+1):
        crange = np.arange(j)[:,None]
        bb[nrange,left+crange] = b[nrange,left+crange]        
        b[nrange,left] = 0.0
        for i in xrange(j):
            f = bb[nrange,left+i] / (kv[right+i] - kv[right+i-j])
            b[nrange,left+i] = b[nrange,left+i] + f * (kv[right+i] - u)
            b[nrange,left+i+1] = f * (u - kv[right+i-j])
            
    return b

Testing against unutbu's version of my original basis function:

fb = fitpack_basis(c,n,d) #22 function calls in 0.044 seconds
b = basis(c,n,d) #81 function calls (45 primitive calls) in 0.013 seconds  ~5 times faster
print np.allclose(b,fb) # Returns True

My function is 5 times slower, but still relatively fast. What I like about it is that it lets me use sample ranges that go beyond the boundaries, which is something of use in my application. For example:

print fitpack_basis(c,5,d,rMinOffset=-0.1,rMaxOffset=.2)
[[ 1.331  -0.3468  0.0159 -0.0002  0.      0.    ]
[ 0.0208  0.4766  0.4391  0.0635  0.      0.    ]
[ 0.      0.0228  0.4398  0.4959  0.0416  0.    ]
[ 0.      0.      0.0407  0.3621  0.5444  0.0527]
[ 0.      0.     -0.0013  0.0673 -0.794   1.728 ]]

So for that reason I will probably use fitpack_basis since it's relatively fast. But i would love suggestions for improving its performance and hopefully get closer to unutbu's version of the original basis function i wrote.

Community
  • 1
  • 1
Fnord
  • 5,365
  • 4
  • 31
  • 48
  • Please post the code you used to calculate the basis for each sample. – unutbu Feb 06 '16 at 15:29
  • What exactly do you mean by "per sample basis used splev when evaluating a spline"? You figured how to give splev knots and coefficients, what exactly are you looking for? – ev-br Feb 06 '16 at 19:45
  • @unutbu i rewrote my question completely + added code to (slowly) obtain basis – Fnord Feb 07 '16 at 08:12
  • @ev-br I re-wrote my question, hopefully this will clarify what i'm looking for. – Fnord Feb 07 '16 at 08:13
  • [`scipy.interpolate.splev`](https://github.com/scipy/scipy/blob/master/scipy/interpolate/fitpack.py#L527) delegates the work to the [FITPACK Fortran routine SPLEV](https://github.com/scipy/scipy/blob/master/scipy/interpolate/fitpack/splev.f). That routine calls [`fpbspl`](https://github.com/scipy/scipy/blob/master/scipy/interpolate/fitpack/fpbspl.f) which evaluates the b-splines using the de Boor and Cox recurrence relation. Unfortunately, scipy does not expose fpbspl to be called from Python. It seems to only be called internally by other Fortran routines. – unutbu Feb 07 '16 at 15:27
  • @unutbu i added an addendum to my question (see at bottom): After looking up fitpack's Fortran routine, I wrote a version in python that is a little slower than yours but will suit my needs. If you have any suggestions as to improve its performance, please let me know! – Fnord Feb 12 '16 at 23:46
  • @ev-br slightly off topic: I see that `splev` has an argument to get a spline's derivative (der). [Looking at the source](https://github.com/scipy/scipy/blob/master/scipy/interpolate/fitpack.py#L527) i see that `splev` passes that argument to `_fitpack._spl_` but i can't for the life of me find this function. My goal is to figure out how the derivative is calculated so i can use the info to calculate the tangent on spline at specific locations. – Fnord Mar 06 '16 at 07:08
  • @Fnord https://github.com/scipy/scipy/blob/master/scipy/interpolate/src/__fitpack.h#L868; see also the routine `splder` and https://github.com/scipy/scipy/pull/3174. Discussing more of this probably warrants a separate question though. Good luck! – ev-br Mar 06 '16 at 12:38

2 Answers2

4

Here is a version of coxDeBoor which is (on my machine) 840x faster than the original.

In [102]: %timeit basis(len(cv), 10**5, degree)
1 loops, best of 3: 24.5 s per loop

In [104]: %timeit bspline_basis(len(cv), 10**5, degree)
10 loops, best of 3: 29.1 ms per loop

import numpy as np
import scipy.interpolate as si

def scipy_bspline(cv, n, degree):
    """ bspline basis function
        c        = list of control points.
        n        = number of points on the curve.
        degree   = curve degree
    """
    # Create a range of u values
    c = len(cv)
    kv = np.array(
        [0] * degree + range(c - degree + 1) + [c - degree] * degree, dtype='int')
    u = np.linspace(0, c - degree, n)

    # Calculate result
    arange = np.arange(n)
    points = np.zeros((n, cv.shape[1]))
    for i in xrange(cv.shape[1]):
        points[arange, i] = si.splev(u, (kv, cv[:, i], degree))

    return points


def memo(f):
    # Peter Norvig's
    """Memoize the return value for each call to f(args).
    Then when called again with same args, we can just look it up."""
    cache = {}

    def _f(*args):
        try:
            return cache[args]
        except KeyError:
            cache[args] = result = f(*args)
            return result
        except TypeError:
            # some element of args can't be a dict key
            return f(*args)
    _f.cache = cache
    return _f


def bspline_basis(c, n, degree):
    """ bspline basis function
        c        = number of control points.
        n        = number of points on the curve.
        degree   = curve degree
    """
    # Create knot vector and a range of samples on the curve
    kv = np.array([0] * degree + range(c - degree + 1) +
                  [c - degree] * degree, dtype='int')  # knot vector
    u = np.linspace(0, c - degree, n)  # samples range

    # Cox - DeBoor recursive function to calculate basis
    @memo
    def coxDeBoor(k, d):
        # Test for end conditions
        if (d == 0):
            return ((u - kv[k] >= 0) & (u - kv[k + 1] < 0)).astype(int)

        denom1 = kv[k + d] - kv[k]
        term1 = 0
        if denom1 > 0:
            term1 = ((u - kv[k]) / denom1) * coxDeBoor(k, d - 1)

        denom2 = kv[k + d + 1] - kv[k + 1]
        term2 = 0
        if denom2 > 0:
            term2 = ((-(u - kv[k + d + 1]) / denom2) * coxDeBoor(k + 1, d - 1))

        return term1 + term2

    # Compute basis for each point
    b = np.column_stack([coxDeBoor(k, degree) for k in xrange(c)])
    b[n - 1][-1] = 1

    return b

# Control points
cv = np.array([[50.,  25., 0.],
               [59.,  12., 0.],
               [50.,  10., 0.],
               [57.,   2., 0.],
               [40.,   4., 0.],
               [40.,   14., 0.]])

n = 10 ** 6
degree = 3  # Curve degree
points_scipy = scipy_bspline(cv, n, degree)

b = bspline_basis(len(cv), n, degree)
points_basis = np.dot(b, cv)  
print np.allclose(points_basis, points_scipy)

The majority of the speedup is achieved by making coxDeBoor compute a vector of results instead of a single value at a time. Notice that u is removed from the arguments passed to coxDeBoor. Instead, the new coxDeBoor(k, d) calculates what was before np.array([coxDeBoor(u[i], k, d) for i in xrange(n)]).

Since NumPy array arithmetic has the same syntax as scalar arithmetic, very little code needed to change. The only syntactic change was in the end condition:

if (d == 0):
    return ((u - kv[k] >= 0) & (u - kv[k + 1] < 0)).astype(int)

(u - kv[k] >= 0) and (u - kv[k + 1] < 0) are boolean arrays. astype changes the array values to 0 and 1. So whereas before a single 0 or 1 was returned, now a whole array of 0s and 1s are returned -- one for each value in u.

Memoization can also improve performance, since the recurrence relations causes coxDeBoor(k, d) to be called for the same values of k and d more than once. The decorator syntax

@memo
def coxDeBoor(k, d):
    ...

is equivalent to

def coxDeBoor(k, d):
    ...
coxDeBoor = memo(coxDeBoor)

and the memo decorator causes coxDeBoor to record in a cache a mapping from (k, d) pairs of arguments to returned values. If coxDeBoor(k, d) is called again, then value from the cache will be returned instead of re-computing the value.


scipy_bspline is still faster, but at least bspline_basis plus np.dot is in the ballpark, and may be useful if you want to re-use b with many control points, cv.

In [109]: %timeit scipy_bspline(cv, 10**5, degree)
100 loops, best of 3: 17.2 ms per loop

In [104]: %timeit b = bspline_basis(len(cv), 10**5, degree)
10 loops, best of 3: 29.1 ms per loop

In [111]: %timeit points_basis = np.dot(b, cv)
100 loops, best of 3: 2.46 ms per loop
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • (+1) This is likely the way to go, at least as long as https://github.com/scipy/scipy/pull/3174 is not merged. – ev-br Feb 07 '16 at 23:22
  • @unutbu thanks for the memoization info. This will work fine for my needs until scipy exposes fpbspl. There is a one benefit splev had i'll need to figure out: when given an out of bounds values (ex: say I want to query u=-1.3) splev would return a point that follows the border tangent out. (in scipy_bspline if you change u=np.linspace(-1,c-degree+1,n), you'll see what i mean) I'm not sure this is something splev achieves purely with a basis (with negative values or values > 1?) or if it just catches this condition and forks to something else. That was a nice 'feature' i was using. – Fnord Feb 08 '16 at 07:55
  • @ev-br thanks for the info. I'll keep an eye out for the new features, may they roll in soon! – Fnord Feb 08 '16 at 07:56
3

fitpack_basis uses a double loop which iteratively modifies elements in bb and b. I don't see a way to use NumPy to vectorize these loops since the value of bb and b at each stage of the iteration depends on the values from previous iterations. In situations like this sometimes Cython can be used to improve the performance of the loops.

Here is a Cython-ized version of fitpack_basis, which runs as fast as bspline_basis. The main ideas used to boost speed using Cython is to declare the type of every variable, and rewrite all uses of NumPy fancy indexing as loops using plain integer indexing.

See this page for instructions on how to build the code and run it from python.

import numpy as np
cimport numpy as np
cimport cython

ctypedef np.float64_t DTYPE_f
ctypedef np.int64_t DTYPE_i

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def cython_fitpack_basis(int c, int n=100, int d=3, 
                         double rMinOffset=0, double rMaxOffset=0):
    """ fitpack's spline basis function
        c = number of control points.
        n = number of points on the curve.
        d = curve degree
    """
    cdef Py_ssize_t i, j, k, l
    cdef double f

    # Create knot vector
    cdef np.ndarray[DTYPE_i, ndim=1] kv = np.array(
        [0]*d + range(c-d+1) + [c-d]*d, dtype=np.int64)

    # Create sample range
    cdef np.ndarray[DTYPE_f, ndim=1] u = np.linspace(
        rMinOffset, rMaxOffset + c - d, n)

    # basis
    cdef np.ndarray[DTYPE_f, ndim=2] b  = np.zeros((n,c)) 
    # basis buffer
    cdef np.ndarray[DTYPE_f, ndim=2] bb = np.zeros((n,c)) 
    # left  knot vector indices
    cdef np.ndarray[DTYPE_i, ndim=1] left = np.clip(np.floor(u), 0, c-d-1).astype(np.int64)   
    # right knot vector indices
    cdef np.ndarray[DTYPE_i, ndim=1] right = left+d+1 

    for k in range(n):
        b[k, left[k]] = 1.0

    for j in range(1, d+1):
        for l in range(j):
            for k in range(n):
                bb[k, left[k] + l] = b[k, left[k] + l] 
                b[k, left[k]] = 0.0
        for i in range(j):
            for k in range(n):
                f = bb[k, left[k]+i] / (kv[right[k]+i] - kv[right[k]+i-j])
                b[k, left[k]+i] = b[k, left[k]+i] + f * (kv[right[k]+i] - u[k])
                b[k, left[k]+i+1] = f * (u[k] - kv[right[k]+i-j])
    return b

Using this timeit code to benchmark it's performance,

import timeit
import numpy as np
import cython_bspline as CB
import numpy_bspline as NB

c = 6
n = 10**5
d = 3

fb = NB.fitpack_basis(c, n, d)
bb = NB.bspline_basis(c, n, d) 
cfb = CB.cython_fitpack_basis(c,n,d) 

assert np.allclose(bb, fb) 
assert np.allclose(cfb, fb) 
# print(NB.fitpack_basis(c,5,d,rMinOffset=-0.1,rMaxOffset=.2))

timing = dict()
timing['NB.fitpack_basis'] = timeit.timeit(
    stmt='NB.fitpack_basis(c, n, d)', 
    setup='from __main__ import NB, c, n, d', 
    number=10)
timing['NB.bspline_basis'] = timeit.timeit(
    stmt='NB.bspline_basis(c, n, d)', 
    setup='from __main__ import NB, c, n, d', 
    number=10)
timing['CB.cython_fitpack_basis'] = timeit.timeit(
    stmt='CB.cython_fitpack_basis(c, n, d)', 
    setup='from __main__ import CB, c, n, d', 
    number=10)

for func_name, t in timing.items():
    print "{:>25}: {:.4f}".format(func_name, t)

it appears Cython can make the fitpack_basis code run as fast as (and perhaps a bit faster) than bspline_basis:

         NB.bspline_basis: 0.3322
  CB.cython_fitpack_basis: 0.2939
         NB.fitpack_basis: 0.9182
Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Nice @unutbu! Unfortunately I have very little experience with Cython and I don't know how to use your code in my application. I compiled Cython to work with the custom Python interpreter we use (2.7.3 MSC v.1600 64 bit). I can import cython as a module and call it's functions and classes (ex: cython.array), but all the cython centric commands you have (cdef, cimport, etc) all error out with invalid syntax. – Fnord Feb 15 '16 at 18:22
  • I was able to produce a .pyd file! But I get this error when running cython_fitpack_basis(6) : line 34: Buffer dtype mismatch, expected 'DTYPE_i' but got 'long' # line 34 is cdef np.ndarray[DTYPE_i, ndim=1] left = np.clip(np.floor(u), 0, c-d-1).astype(int) – Fnord Feb 24 '16 at 00:28
  • My mistake. `.astype(int)` should be `.astype(np.int64)`. I've edited the post to reflect that too. – unutbu Feb 24 '16 at 00:47
  • I just found the issue as well and was about to let you know! This is awesome! Thanks a ton! – Fnord Feb 24 '16 at 00:55
  • do you have any advice about cythonizing the function described here: http://stackoverflow.com/questions/35520384/improving-a-numpy-implementation-of-a-simple-spring-network/35525713#35525713 i'm currently trying to make it work but i'm hitting a wall at the proper way to define an array of 3d points. I would greatly appreciate your input – Fnord Feb 24 '16 at 20:19