Numpy vectorize for a lambda with multiple arguments

Question

I tried for a while last night to use the various different numpy vectorisation functions for a task (which to be clear is perfectly doable without them), to create linear interpolations between points.

Let's say I have a vector of floats (let's call them "points"),

v = np.array([9. , 1. , 4.2, 5.6, 3. , 4.6])

I want to interpolate between adjacent points, so I need to take these pairs:

def adjacent_pairs(v):
    """
    Given a 1D numpy array `v = np.array([1, ..., n])`, return a 2D numpy array of
    adjacent pairs, `np.array([(1,2), ..., (n-1,n)])`.
    """
    s = v.shape
    d = len(s)
    assert d == 1, ValueError(f"Vector must be 1D - got a {d}D vector: shape = {s})")
    return np.vstack([v[:-1],v[1:]]).T

adjacent_pairs(v) gives:

array([[9. , 1. ],
       [1. , 4.2],
       [4.2, 5.6],
       [5.6, 3. ],
       [3. , 4.6]])

I want to interpolate these pairs (the rows of the matrix, e.g. [9., 1.]) by intervals of size 0.2, but the interpolations may be ascending or descending, so I normalise the difference vector to find the 'direction' or sign (+1 if ascending, -1 for descending) and multiply this by the step size to pass to arange as the step argument.

This works:

def interpolate_1d(v, step=0.2):
    v_adj = adjacent_pairs(v)
    d = np.diff(v_adj) / np.abs(np.diff(v_adj))
    interpolated = [np.arange(*r, diff * step) for r, diff in zip(v_adj, d)]
    return interpolated

However I'm conscious that the zip() part isn't "in" numpy, and perhaps I should be doing it in a way that is.

I started looking at the various 'vectorised' functions in numpy (which as I understand it can sometimes speed up your code), but I'm having trouble reformatting this code into the abstractions of np.fromiter, np.vectorize, or np.frompyfunc and after a few hours last night I'm hoping someone more familiar with these can enlighten me as to how I could use one or more of these with my code.

I'd prefer to pass in the row and the difference sign separately (as lambda row, diff: ...), but I couldn't manage to get these to work, so I hstacked the v_adj and d arrays so that each row would hold both of them (and I would only need one argument to the lambda).

Here are the two versions of the function:

def interpolate_1d_vectorised(v, step=0.2):
    """
    Couldn't get this to work: how to expand out the two parts at a time to pass to
    the lambda function?
    """
    v_adj = adjacent_pairs(v)
    d = np.diff(v_adj) / np.abs(np.diff(v_adj))
    # lambda_func = lambda row, diff: np.arange(*row, diff * step)
    lambda_func = lambda row, diff: np.arange(row[0], row[1], diff * step)
    row_arange = np.vectorize(lambda_func, signature="(),()->()")
    interpolated = row_arange(v_adj, d)
    return interpolated


def interpolate_1d_vectorised_triples(v, step=0.2):
    v_adj = adjacent_pairs(v)
    d = np.diff(v_adj) / np.abs(np.diff(v_adj))
    triples = np.hstack([v_adj, d])
    triple_lambda = lambda t: np.arange(t[0], t[1], t[2] * step)
    row_arange_t = np.vectorize(triple_lambda, signature="()->()")
    interpolated = row_arange_t(triples)
    return interpolated

Some sample errors I got:

ValueError: setting an array element with a sequence.
- from row_arange(v_adj, d) where row_arange = np.vectorize(lambda_func, signature="(),()->()") (as in interpolate_1d_vectorised)
- also from np.fromiter([np.arange(a,b,c * step) for (a,b,c) in triples])

I tried debugging with a lambda function that just prints out the values it's working on, and it seems that the vectorising happens over every value in the array, rather than every row (which is what I'd like). This seems to explain the error message, but I'm still unclear on how to take three values at a time (or a row at a time) as input into the vectorised function and produce one output per that input.

I've used np.apply_along_axis and np.apply_over_axes before but I was getting various errors using these too.

I expected this to work:

triple_lambda = lambda t: np.arange(t[0], t[1], t[2] * 0.2)
np.apply_along_axis(triple_lambda, 1, triples)

but it gave: ValueError: could not broadcast input array from shape (16) into shape (40), which I assume means that the interpolated values make the vector larger.

np.apply_over_axes(triple_lambda, triples, axes=[0,2]) gave TypeError: <lambda>() takes 1 positional argument but 2 were given (same when axes=[0,1]).

(This was about the point I gave up)

Sorry if this isn't the right application to use these functions for, please let me know if there's something else better meant for this (and what if anything these functions would be used for instead). I was going to just delete these attempts and move on but thought I should ask here so I might learn how to use these functions in future. Any advice much appreciated!

None of those functions, `np.vectorize``, `np.apply...` speed up your code. `np.vectorize` has an explicit disclaimer. Sometimes they are convenient when writing the loops is awkward, but they aren't the speedup tools you are looking for. — hpaulj, Apr 21 '20 at 15:46
Yes, I did see that disclaimer and that's why I added the note about them "sometimes" doing so. I'd still like to understand how to use them either way. My primary reason for trying to rewrite in this format was to get rid of the `list(zip())` construction which looks un-numpy-ish — Louis Maddox, Apr 21 '20 at 16:12
Nice work. np.repeat is the trick for this sort of non-constant interval size — Mad Physicist, Apr 21 '20 at 19:16

Mad Physicist · Answer 1 · 2020-04-21T21:06:45.480

So to start with, lambda is equivalent to def, but more restrictive. You really don't need to use lambda, since you can pass in any function by name, as you would any other object.

Second, np.vectorize is basically a glorified for loop. It processes one element at a time. You do not have the option of returning values of different sizes, which you need here. This explains your current errors. Even without the errors, it really won't do much better than your initial zip. From the docs:

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

Let's start by computing the number of elements in each range:

ranges = np.diff(v)
sign = np.sign(ranges)
steps = np.ceil(np.abs(ranges) / step).astype(int)
steps[-1] += 1

You can now make a vector of increments that are the same size as output:

increments = np.repeat(step * sign, steps)

You can run cumsum on the increments if you set up the initial values for each segment. The start of each segment is the corresponding value of v, minus the prior residue.

range_start = np.cumsum(steps[:-1])
increments[0] = v[0]
increments[range_start] = v[1:-1] - (v[0:-2] + sign[:-1] * (steps[:-1] - 1) * step)

Now you can just take the cumulative sum (and set the last element):

result = np.cumsum(increments)
result[-1] = v[-1]

You will likely run into some problems with roundoff error occasionally, which is why the general-purpose solution of removing arbitrary residuals is a good idea. It also handles non-integer multiples of the step properly:

>>> interpolate_1d(v)
array([9. , 8.8, 8.6, 8.4, 8.2, 8. , 7.8, 7.6, 7.4, 7.2, 7. , 6.8, 6.6,
       6.4, 6.2, 6. , 5.8, 5.6, 5.4, 5.2, 5. , 4.8, 4.6, 4.4, 4.2, 4. ,
       3.8, 3.6, 3.4, 3.2, 3. , 2.8, 2.6, 2.4, 2.2, 2. , 1.8, 1.6, 1.4,
       1.2, 1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. , 3.2,
       3.4, 3.6, 3.8, 4. , 4.2, 4.4, 4.6, 4.8, 5. , 5.2, 5.4, 5.6, 5.4,
       5.2, 5. , 4.8, 4.6, 4.4, 4.2, 4. , 3.8, 3.6, 3.4, 3.2, 3. , 3.2,
       3.4, 3.6, 3.8, 4. , 4.2, 4.4, 4.6])
>>> interpolate_1d([1., 2.5, 1.])
array([1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.5, 2.3, 2.1, 1.9, 1.7,
       1.5, 1.3, 1.1, 1. ])

On that note, if you are 100% sure that all your ranges are multiples of the step size, and you don't care about a little roundoff error, you can just sum over the original definition of increments without any further modification:

increments = np.repeat(step * sign, steps)
increments[0] = v[0]
result = np.cumsum(increments)

TL;DR

def interpolate_1d(v, step=0.2):
    ranges = np.diff(v)
    sign = np.sign(ranges)
    steps = np.ceil(np.abs(ranges) / step).astype(int)
    steps[-1] += 1
    range_start = np.cumsum(steps[:-1])
    increments = np.repeat(step * sign, steps)
    increments[0] = v[0]
    increments[range_start] = v[1:-1] - (v[0:-2] + sign[:-1] * (steps[:-1] - 1) * step)
    result = np.cumsum(increments)
    result[-1] = v[-1]
    return result

Numpy vectorize for a lambda with multiple arguments

1 Answers1