How to optimize a looped parameterized function call using numpy?

Question

I am trying to convert a nested loop over a numpy array into a numpy-optimized implementation. The function being called inside the loop takes a 4D vector and a separate parameter, and outputs a 4D vector which is supposed to replace the old 4D vector based on operations with the new value. If relevant, the function a Welford online update which updates mean and standard deviation based on a new value, with the 4D vector being [old_mean, old_std, old_s, num_values]. For each pixel channel, I am saving these values in the history_array for updating the distribution based on future pixel values.

My present code looks like this:

def welford_next(arr:np.ndarray, new_point:np.float32) -> np.ndarray:
    old_mean, _, old_s, num_points = arr

    num_points += 1
    new_mean = old_mean + (new_point - old_mean) / num_points
    new_s = old_s + (new_point - old_mean) * (new_point - new_mean)

    return [new_mean, np.sqrt(new_s / num_points) if num_points > 1 else new_s, new_s, num_points]

updates = [10., 20., 30., 40., 90., 80.]
history_array = np.zeros(shape = b.shape + (4,))   # shape: [6,3,3,4]
print(f'History Shape: {history_array.shape}')
history_array_2 = np.zeros_like(history_array)

for update in updates:
    image = np.empty(shape = b.shape)              # shape: [6,3,3] (h x w x c)
    image.fill(update)

    for i, row in enumerate(image):                # Prohibitively expensive
        for j, col in enumerate(row):
            for k, channel in enumerate(col):
                history_array[i][j][k] = welford_next(history_array[i][j][k], channel)

    history_array_2 = np.apply_along_axis(welford_next, axis=2, arr=history_array_2)
    
    print(history_array == history_array_2)

However, the np.apply_along_axis() is not seem to be viable because it does not allow additional parameters to be passed alongside the array itself.I also came across np.ufunc which the welford_next() function can be converted to using np.frompyfunc() but it is unclear how it could help me reach the desired target.

How do I achieve this looped operation using numpy?

If `welford_next` has to be called once for each `i,j,k` combination, there isn't a way of "speeding up' those calls. Unless that function is trivial, the majority of time will be spent doing those calls, not in the details of the iteration mechanism. — hpaulj, Dec 20 '22 at 21:01
`apply_along_axis` is NOT a performance/optimization tool. It needs a disclaimer, much like what `np.vectorize` has. `frompyfunc` isn't any better - it's a version of `np.vectorize` that returns object dtype arrays. `apply_along_axis` is designed for a function that takes one 1d array as an argument. `frompyfunc` and `vectorize` are for functions that take one or more scalar arguments. They can handy if you want to take advantage of `broadcasting` to work with several arrays - but they aren't speed tools. — hpaulj, Dec 20 '22 at 21:03

score 1 · Accepted Answer · answered Dec 22 '22 at 20:00

The numpy optimized way to do this would be to change the way we use the welford_next() function. As mentioned in the comments, repeated calls to a function cannot be optimized, thus the function call needs to be limited to once per frame and optimization needs to be done inside the function itself. The following implementation works ~ 50x faster.

def welford(history:np.ndarray, frame:np.ndarray) -> np.ndarray:

    old_mean, _, old_s, num_points = np.transpose(history, [3,0,1,2])
    num_points += 1.
    new_mean = old_mean + (frame - old_mean) / num_points
    new_s = old_s + (frame - old_mean) * (frame - new_mean)
    new_std = np.sqrt(new_s / num_points) if num_points[0][0][0] > 1 else new_s

    return np.transpose(np.array([new_mean, new_std, new_s, num_points]), [1,2,3,0])

updates = [10., 20., 30., 40., 90., 80.]
history_array = np.zeros(shape = b.shape + (4,))   # shape: [6,3,3,4]

for update in updates:
    image = np.empty(shape = b.shape)              # shape: [6,3,3] (h x w x c)
    image.fill(update)

    history_array = welford(history_array, image)

How to optimize a looped parameterized function call using numpy?

1 Answers1