We have two arrays arr1
(which has string elements) and arr2
(which has integers).
I want to clip first arr2[i]
characters from arr[i]
. These arrays are very large and so I want to implement this in Numba
cuda. Pythonic implementation is as follows:
arr1 = ['abc', 'def', 'xyz']
arr2 = [1,2,3]
def python_clipper(arr1,arr2):
for i in range(len(arr1)):
arr1[i] = arr1[i][arr2[i]:]
return arr1
print(python_clipper(arr1,arr2)) # ['bc', 'f', '']
The above implementation works fine. But when I create a cuda
function out of this python function like so:
@cuda.jit()
def cuda_clipper(arr1,arr2):
i = cuda.grid(1)
arr1[i] = arr1[i][arr2[i]:]
blockspergrid, threadsperblock = len(arr1),1
cuda_clipper[blockspergrid, threadsperblock](arr1,arr2) # ['bc', 'f', '']
print(arr1)
I get the following error:
numba.core.errors.TypingError: Failed in cuda mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function _empty_string at 0x7f0456884d30>) found for signature:
>>> _empty_string(int64, int64, bool)
There are 2 candidate implementations:
- Of which 2 did not match due to:
Overload in function 'register_jitable.<locals>.wrap.<locals>.ov_wrap': File: numba/core/extending.py: Line 159.
With argument(s): '(int64, int64, bool)':
Rejected as the implementation raised a specific error:
NumbaRuntimeError: Failed in nopython mode pipeline (step: native lowering)
NRT required but not enabled
During: lowering "s = call $10load_global.3(kind, char_width, length, is_ascii, func=$10load_global.3, args=[Var(kind, unicode.py:276), Var(char_width, unicode.py:276), Var(length, unicode.py:276), Var(is_ascii, unicode.py:276)], kws=(), vararg=None, varkwarg=None, target=None)" at /mnt/local-raid10/workspace/user/anaconda3/envs/condaenv/lib/python3.9/site-packages/numba/cpython/unicode.py (277)
raised from /mnt/local-raid10/workspace/user/anaconda3/envs/condaenv/lib/python3.9/site-packages/numba/core/runtime/context.py:19
During: resolving callee type: Function(<function _empty_string at 0x7f0456884d30>)
During: typing of call at /mnt/local-raid10/workspace/user/anaconda3/envs/condaenv/lib/python3.9/site-packages/numba/cpython/unicode.py (1700)
File "../../anaconda3/envs/condaenv/lib/python3.9/site-packages/numba/cpython/unicode.py", line 1700:
def getitem_slice(s, idx):
<source elided>
# It's heterogeneous in kind OR stride != 1
ret = _empty_string(kind, span, is_ascii)
^
During: typing of intrinsic-call at /mnt/local-raid10/workspace/user/trim/trim_new_implementation/string_numba.py (143)
File "string_numba.py", line 143:
def cuda_clipper(arr1,arr2):
<source elided>
i = cuda.grid(1)
arr1[i] = arr1[i][arr2[i]:]
^
I am under the impression that slicing the string is the problem as a similar implementation works fine with an array. I have tried to make the arr1
into an array of array, but that preprocess itself takes some time rendering cuda
useless to improve the performance. How can I directly work with str
within numba
rather than thinking of circumventing the problem.