4

I am working on simulating traps in CCD arrays. Currently I am using NumPy and Scipy, and I have been able to vectorize most of the calls which have given me some speed-up. At the moment the bottleneck in my code is that I have to retrieve a number from a large number of different interpolations in the inner loop of my code. This particular step takes up ~97% of the computing time.

I have made a simple example of my problem here:

import numpy as np
from scipy.interpolate import interp1d

# the CCD array containing values from 0-100
array = np.random.random(200)*100

# a number of traps at different positions in the CCD array 
n_traps = 100
trap_positions = np.random.randint(0,200,n_traps)

# xvalues for the interpolations
xval = [0,10,100]
# each trap has y values corresponding to the x values 
trap_yvals = [np.random.random(3)*100 for _ in range(n_traps)]
# The xval-to-yval interpolation is made for each trap
yval_interps = [interp1d(xval,yval) for yval in trap_yvals]

# moving the trap positions down over the array
for i in range(len(array)):
    # calculating new trap position
    new_trap_pos = trap_positions+i
    # omitting traps that are outside array
    trap_inside_array = new_trap_pos < len(array)
    # finding the array_vals (corresponding to the xvalues in the interpolations)
    array_vals = array[new_trap_pos[trap_inside_array]]

    # retrieving the interpolated y-values (this is the bottleneck)
    yvals = np.array([yval_interps[trap_inside_array[t]](array_vals[t]) 
                       for t in range(len(array_vals))])

    # some more operations using yvals

Is there a way this can be optimized, maybe using Cython or similar?

Skottfelt
  • 86
  • 8
  • 1
    see [this](http://stackoverflow.com/questions/23909266/interpolation-and-extrapolation-for-large-arrays/34289911#34289911), instead of interp1d, use InterpolatedUnivariateSpline. Improves performance by orders of magnitude – M.T Feb 19 '16 at 15:15
  • Another significant improvement is to pass arrays as an argument to interp1d/InterpolatedUnivariateSpline instead of looping over single values if possible – M.T Feb 19 '16 at 15:26
  • @M.T: Thanks for the hint about speed-up using InterpolatedUnivariateSpline. The reason I am looping over the single values, is that each value needs to be extracted from a different interpolation, and I have found no way around this. – Skottfelt Feb 19 '16 at 20:05

1 Answers1

3

I have mulled this over a bit and I think I have found a pretty good solution that I wanted to share, although this means that I will be answering my own question.

First of all it dawned on me that instead of using one of the scipy.interpolation functions, I could just find the interpolation between two values. This can be done with this little function

from bisect import bisect_left

def two_value_interpolation(x,y,val):
    index = bisect_left(x,val)
    _xrange = x[index] - x[index-1]
    xdiff = val - x[index-1]
    modolo = xdiff/_xrange
    ydiff = y[index] - y[index-1]
    return y[index-1] + modolo*ydiff

This gave me some speed-up, but I wanted to see if I could do even better so I ported the function to Cython and added the loop over all the traps so I didn't have to do that in the python code:

# cython: boundscheck=False
# cython: wraparound=False
# cython: cdivision=True

import numpy as np
cimport numpy as np

def two_value_interpolation_c(np.ndarray[np.float64_t] x, 
                                 np.ndarray[np.float64_t, ndim=2] y,
                                 np.ndarray[np.float64_t] val_array):
    cdef unsigned int index, trap
    cdef unsigned int ntraps=val_array.size
    cdef long double val, _xrange, xdiff, modolo, ydiff
    cdef np.ndarray y_interp = np.zeros(ntraps, dtype=np.float64)

    for trap in range(ntraps):
        index = 0
        val = val_array[trap]
        while x[index] <= val:
            index += 1

        _xrange = x[index] - x[index-1]
        xdiff = val - x[index-1]
        modolo = xdiff/_xrange
        ydiff = y[trap,index] - y[trap,index-1]
        y_interp[trap] = y[trap,index-1] + modolo*ydiff
    return y_interp

I ran some timings on the different methods (using some larger arrays and more traps than indicated in the original question):

Using the original method, i.e interp1d: (best of 3) 15.1 sec

Using InterpolatedUnivariateSpline (k=1) instead of interp1d as suggested by @M.T: (best of 3) 7.25 sec

Using the two_value_interpolation function: (best of 3) 1.34 sec

Using the Cython implementation two_value_interpolation_c: (best of 3) 0.113 sec

Skottfelt
  • 86
  • 8