I am using cudf (dask-cudf) to handle tens~billions of data for social media. I'm trying to use query in extracting only the relevant users from the mother data set.
However, unlike pandas, cudf's query will error if I pass in a list or set.
The environment is anaconda rapids22.12 and cuda is 11.4.
The error is as follows:
TypingError: Failed in cuda mode pipeline (step: nopython frontend)
Internal error at <numba.core.typeinfer.CallConstraint object at 0x7f381a6097f0>.
Failed in cuda mode pipeline (step: native lowering)
Failed in nopython mode pipeline (step: native lowering)
NRT required but not enabled
During: lowering "$6for_iter.1 = iternext(value=$phi6.0)" at /home/user/.pyenv/versions/anaconda3-2020.11/envs/rapids-22.12/lib/python3.8/site-packages/numba/cpython/listobj.py (664)
During: lowering "$6compare_op.2 = src in __CUDF_ENVREF__test" at <string> (2)
During: resolving callee type: type(CUDADispatcher(<function queryexpr_5ee033e5bcab9f09 at 0x7f381b909ee0>))
During: typing of call at <string> (6)
Enable logging at debug level for details.
File "<string>", line 6:
<source missing, REPL/exec in use?>
test code is as follows:
df is a cudf.DataFrame and is a table of edge lists consisting of "src" and "dst" columns
test = list(test_userid)[0:2]
df.query("(src==@test)or(dst==@test)") #ok if one value not list
df.query("src.isin(@test)") #ng
df.query("src in @test") #ng
df.query("src==@test") #ng
It is not essential to use query, so if there is a way to extract other than query, I would like to know that as well.
I have confirmed that the code can successfully extract if it is by pandas. Also, the cudf query works correctly if it is a single value, not a list. I believe that it should work properly even if you pass lists to cudf.