0

I am trying to use numpy from within jit optimized code of Numba but I am getting errors when I am trying to do standard numpy operations like numpy.ones_like, even though numba documentation mentions that the operation is supported.

Documentation link: Numba 0.46.

Edit: The method 'calc_method' works fine if I make a direct call to it, fails when used from within apply_chunks. So probably not an issue with Numba itself but how cudf.apply_chunks is being used.

Code:

import numba
from numba import jit
import pandas as pd
import numpy as np

print(numba.__version__)

@jit(nopython=True)
def calc_method(a,b):
    a1 = np.float64(a)
    b1 = np.float64(b)
    abc = (a1, np.ones_like(b1))
    abc_ht = np.hstack(abc)
    return abc_ht

def calculate(cudf_df: cudf, size_of_row: int):       
    return cudf_df.apply_chunks(calc_method, incols=['a', 'b'], outcols=dict(), chunks=size_of_row)

df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6, 7, 8], 'b': [11, 12, 13, 14, 15, 16, 17, 18]})
cudf_df = cudf.DataFrame.from_pandas(df)
a, b = calculate(cudf_df, 4)

Error:

TypingError                               Traceback (most recent call last)
<ipython-input-38-ad56fb75bc4a> in <module>
----> 1 a, b = calculate(cudf_df, 4)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x7fa78521b550>) with argument(s) of type(s): (array(int64, 1d, A), array(int64, 1d, A))
 * parameterized
In definition 0:
    TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Use of unsupported NumPy function 'numpy.ones_like' or unsupported use of the function.

File "<ipython-input-37-97f7d707ba81>", line 9:
def calc_method(a,b):
    <source elided>
    b1 = np.float64(b)
    abc = (a1, np.ones_like(b1))
    ^

Can anyone tell me what am I doing wrong in the above example? Thanks in advance.

I also get a similar error for np.hstack

Note: This is a simplified example to reproduce the issue.

Strider
  • 1
  • 5
  • Try removing the jit annotation, try replacing the pd.DataFrames with python lists. Keep simplifying it until it works. – andy boot Feb 12 '20 at 17:54
  • `apply_chunks` is probably passing a dataframe or Series to your function. `df.to_numpy()` is the preferred way of creating a numpy array from a dataframe. That said, I too get an error with `hstack` using arrays. That said, what do you hope to gain from `numba`? You aren't iterating on anything. `np.concatenate` is already compiled code. So even if it works, `numba` might not save much time. – hpaulj Feb 12 '20 at 18:13
  • @hpaulj Good point, was just using this as a simplified example which reproduces the issue, the actual code has a lot more stuff going on. Will update the question with this note. – Strider Feb 13 '20 at 05:29
  • you might be interested in reading through this issue: https://github.com/dask/distributed/issues/3450 as it covers some recent issues of doing numba ufunc operations with dask – quasiben Feb 13 '20 at 15:24

1 Answers1

1

You cannot use any numpy method that allocates memory from within a JIT kernel. Generally, you need to allocate your outputs ahead of time, and then set the values of those outputs in the kernel.

You can see an example of using apply_chunks here: https://gist.github.com/beckernick/acbfb9e8ac4f0657789930a0dfb57d17#file-udf_apply_chunks_basic_example-ipynb

Keith Kraus
  • 231
  • 1
  • 4
  • While what you have mentioned makes sense, was just wondering why the official documentation mentions that 'ones_like' is supported, it always is going to allocate memory. – Strider Feb 13 '20 at 05:33
  • It may work in the non CUDA jit mode where memory allocations aren't nearly as expensive relative to the compute. If it doesn't work in a pure CPU mode I'd suggest raising an issue / PR on the Numba github about incorrect documentation. – Keith Kraus Feb 18 '20 at 03:48
  • I just checked, the function works with np.float64, np.ones_like and np.hstack, if I directly make a call to the 'calc_method' method. But doesn't work when the same method is called via cudf.apply_stack method. So I guess hstack does work in jit optimized code, as advertised. It's some combination of cudf.apply_chunks with jit optimized code that's causing the issue. @Keith: thanks for all the inputs, I'll reword the question a bit to highlight the cudf apply_chunks angle. – Strider Feb 19 '20 at 09:44