Context:
I have a large 3D array input_array
(size: 1000 x 100 x 50) for which I want to do an operation on specific pixels (their indices are gathered in array idx_valid
). Everytime I do an operation on one of the pixels, I grab the output and put it in the output_array
(size 300 x 100 x 50). I also have 3 other input arrays (dates
, dt_start
, dt_end
) that are used in my real code but not in this example.
Goal:
I want to parallelize these operations because each pixel operation can be treated individually. For that I use Numba, but I can not manage to initialize properly the decorator.
Decorator description:
I have 6 input in my function that I parallelize: input_array
, idx_valid
, dates
, dt_start
, dt_end
, 100
.
Their formats are respectively: float64[:,:,::1], int64[:,::1], float64[::1], float64[::1], float64[:1], int_
.
I have 1 output for my function: output_array
with format float64[:,:,::1]
.
Questions:
- How can I initialize the decorator without getting the following error:
TypeError: No matching definition for argument type(s) array(float64, 3d, C), array(int32, 2d, C), array(float64, 1d, C), array(float64, 1d, C), array(float64, 1d, C), int64
- If I write my decorator
@nb.njit(parallel=True, fastmath=True)
it doesn't work, but with@nb.jit(parallel=True, fastmath=True)
(without the "n") the code works. Why is that ?
Code Example:
import numpy as np
import numba as nb
# Generate random temporal matix time*y*x
input_array = np.random.rand(1000, 100, 50)
# Generate a 2x10000 array of random pixel indices of the 2D footprint of input_array (x*y): 100x50
idx_valid = np.empty((2, 1000), dtype=int)
idx_valid[0] = np.random.randint(0, input_array.shape[1], size=1000)
idx_valid[1] = np.random.randint(0, input_array.shape[2], size=1000)
# Generate temporal matrices, numerical format
dates = np.random.rand(300) # That one is only a fraction of the temporal length of input_array
dt_start = np.random.rand(1000)
dt_end = np.random.rand(1000)
# get_results + interpolator
@nb.njit(nb.float64[:, :, ::1](nb.float64[:,:,::1], nb.int64[:,::1], nb.float64[::1], nb.float64[::1], nb.float64[::1], nb.int_), parallel=True, fastmath=True)
def invertor_parallelized(input_array, idx_valid, dates, dt_start, dt_end, lamb):
# create the output array
output_array = np.full((len(dates), input_array.shape[1], input_array.shape[2]), np.nan)
# Initialize matrices for the operation
A = np.random.rand(1000, 300) # create a 40x300 random array
LHS = np.random.rand(300, 300) # create a 300x300 random array
# Where the parallelization happens, we iterate through the different valid pixels of input_array
for i in nb.prange(idx_valid.shape[1]):
# operation (it's an inversion)
output_array[:,idx_valid[0][i],idx_valid[1][i]] = np.linalg.solve(LHS, np.matmul(np.transpose(A), input_array[:,idx_valid[0][i],idx_valid[1][i]]))
return output_array
nb.set_num_threads(6)
# Example of usage, with lamb = 100:
invertor_parallelized(input_array, idx_valid, dates, dt_start, dt_end, 100)
Code that works:
# Generate random temporal matix time*y*x
input_array = np.random.rand(1000, 100, 50)
# Generate a 2x10000 array of random pixel indices of the 2D footprint of input_array (x*y): 100x50
idx_valid = np.empty((2, 1000), dtype=int)
idx_valid[0] = np.random.randint(0, input_array.shape[1], size=1000)
idx_valid[1] = np.random.randint(0, input_array.shape[2], size=1000)
# Generate temporal matrices, numerical format
dates = np.random.rand(300) # That one is only a fraction of the temporal length of input_array
dt_start = np.random.rand(1000)
dt_end = np.random.rand(1000)
# get_results + interpolator
@nb.jit(parallel=True, fastmath=True)
def invertor_parallelized(input_array, idx_valid, dates, dt_start, dt_end, lamb):
# create the output array
output_array = np.full((len(dates), input_array.shape[1], input_array.shape[2]), np.nan)
# Initialize matrices for the operation
A = np.random.rand(1000, 300) # create a 40x300 random array
LHS = np.random.rand(300, 300) # create a 300x300 random array
# Where the parallelization happens, we iterate through the different valid pixels of input_array
for i in nb.prange(idx_valid.shape[1]):
# operation (it's an inversion)
output_array[:,idx_valid[0][i],idx_valid[1][i]] = np.linalg.solve(LHS, np.matmul(np.transpose(A), input_array[:,idx_valid[0][i],idx_valid[1][i]]))
return output_array
nb.set_num_threads(6)
# Example of usage, with lamb = 100:
invertor_parallelized(input_array, idx_valid, dates, dt_start, dt_end, 100)