I tried for a while last night to use the various different numpy vectorisation functions for a task (which to be clear is perfectly doable without them), to create linear interpolations between points.
Let's say I have a vector of floats (let's call them "points"),
v = np.array([9. , 1. , 4.2, 5.6, 3. , 4.6])
I want to interpolate between adjacent points, so I need to take these pairs:
def adjacent_pairs(v):
"""
Given a 1D numpy array `v = np.array([1, ..., n])`, return a 2D numpy array of
adjacent pairs, `np.array([(1,2), ..., (n-1,n)])`.
"""
s = v.shape
d = len(s)
assert d == 1, ValueError(f"Vector must be 1D - got a {d}D vector: shape = {s})")
return np.vstack([v[:-1],v[1:]]).T
adjacent_pairs(v)
gives:
array([[9. , 1. ],
[1. , 4.2],
[4.2, 5.6],
[5.6, 3. ],
[3. , 4.6]])
I want to interpolate these pairs (the rows of the matrix, e.g. [9., 1.]
) by intervals of size 0.2, but the interpolations may be ascending or descending, so I normalise the difference vector to find the 'direction' or sign (+1 if ascending, -1 for descending) and multiply this by the step size to pass to arange
as the step
argument.
This works:
def interpolate_1d(v, step=0.2):
v_adj = adjacent_pairs(v)
d = np.diff(v_adj) / np.abs(np.diff(v_adj))
interpolated = [np.arange(*r, diff * step) for r, diff in zip(v_adj, d)]
return interpolated
However I'm conscious that the zip()
part isn't "in" numpy, and perhaps I should be doing it in a way that is.
I started looking at the various 'vectorised' functions in numpy (which as I understand it can sometimes speed up your code), but I'm having trouble reformatting this code into the abstractions of np.fromiter
, np.vectorize
, or np.frompyfunc
and after a few hours last night I'm hoping someone more familiar with these can enlighten me as to how I could use one or more of these with my code.
I'd prefer to pass in the row and the difference sign separately (as lambda row, diff: ...
), but I couldn't manage to get these to work, so I hstack
ed the v_adj
and d
arrays so that each row would hold both of them (and I would only need one argument to the lambda).
Here are the two versions of the function:
def interpolate_1d_vectorised(v, step=0.2):
"""
Couldn't get this to work: how to expand out the two parts at a time to pass to
the lambda function?
"""
v_adj = adjacent_pairs(v)
d = np.diff(v_adj) / np.abs(np.diff(v_adj))
# lambda_func = lambda row, diff: np.arange(*row, diff * step)
lambda_func = lambda row, diff: np.arange(row[0], row[1], diff * step)
row_arange = np.vectorize(lambda_func, signature="(),()->()")
interpolated = row_arange(v_adj, d)
return interpolated
def interpolate_1d_vectorised_triples(v, step=0.2):
v_adj = adjacent_pairs(v)
d = np.diff(v_adj) / np.abs(np.diff(v_adj))
triples = np.hstack([v_adj, d])
triple_lambda = lambda t: np.arange(t[0], t[1], t[2] * step)
row_arange_t = np.vectorize(triple_lambda, signature="()->()")
interpolated = row_arange_t(triples)
return interpolated
Some sample errors I got:
ValueError: setting an array element with a sequence.
- from
row_arange(v_adj, d)
whererow_arange = np.vectorize(lambda_func, signature="(),()->()")
(as ininterpolate_1d_vectorised
) - also from
np.fromiter([np.arange(a,b,c * step) for (a,b,c) in triples])
- from
I tried debugging with a lambda function that just prints out the values it's working on, and it seems that the vectorising happens over every value in the array, rather than every row (which is what I'd like). This seems to explain the error message, but I'm still unclear on how to take three values at a time (or a row at a time) as input into the vectorised function and produce one output per that input.
I've used np.apply_along_axis
and np.apply_over_axes
before but I was getting various errors using these too.
I expected this to work:
triple_lambda = lambda t: np.arange(t[0], t[1], t[2] * 0.2)
np.apply_along_axis(triple_lambda, 1, triples)
but it gave: ValueError: could not broadcast input array from shape (16) into shape (40)
,
which I assume means that the interpolated values make the vector larger.
np.apply_over_axes(triple_lambda, triples, axes=[0,2])
gave TypeError: <lambda>() takes 1 positional argument but 2 were given
(same when axes=[0,1]
).
(This was about the point I gave up)
Sorry if this isn't the right application to use these functions for, please let me know if there's something else better meant for this (and what if anything these functions would be used for instead). I was going to just delete these attempts and move on but thought I should ask here so I might learn how to use these functions in future. Any advice much appreciated!