1

I have the following dask_cudf.core.DataFrame:-

import pandas as pd
import numpy as np
import dask_cudf
import cudf


data = {"x":range(1,21), "nor":np.random.normal(2, 4, 20), "unif":np.random.uniform(size = 20)}
df = cudf.DataFrame(data)
ddf = dask_cudf.from_cudf(df, npartitions = 2)
ddf.compute()

I wanted to create 1st till 5th lagged values for the columns nor and unif. However, I create them the following way:-

colz = ["nor", "unif"]
ddf[[s + "_" + str(1) for s in colz]] = ddf[colz].shift(1)
ddf[[s + "_" + str(2) for s in colz]] = ddf[colz].shift(2)

I can create the first and the second lagged value, but not more than that. When I run shift with a value more than 2, I get the following error::-

    /usr/local/lib/python3.7/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname, udf)
    175     try:
--> 176         yield
    177     except Exception as e:

16 frames
cudf/_lib/copying.pyx in cudf._lib.copying.shift()

RuntimeError: parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.7/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname, udf)
    195         )
    196         msg = msg.format(f" in `{funcname}`" if funcname else "", repr(e), tb)
--> 197         raise ValueError(msg) from e
    198
    199

ValueError: Metadata inference failed in `shift`.

Original error is below:
------------------------
RuntimeError('parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument')

Traceback:
---------
  File "/usr/local/lib/python3.7/site-packages/dask/dataframe/utils.py", line 176, in raise_on_meta_error
    yield
  File "/usr/local/lib/python3.7/site-packages/dask/dataframe/core.py", line 5833, in _emulate
    return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
  File "/usr/local/lib/python3.7/site-packages/dask/utils.py", line 1021, in __call__
    return getattr(__obj, self.method)(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 1788, in shift
    return self._shift(periods)
  File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 1793, in _shift
    zip(self._column_names, data_columns), self._index
  File "/usr/local/lib/python3.7/site-packages/cudf/core/dataframe.py", line 818, in _from_data
    out = super()._from_data(data, index)
  File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 140, in _from_data
    Frame.__init__(obj, data, index)
  File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 78, in __init__
    self._data = cudf.core.column_accessor.ColumnAccessor(data)
  File "/usr/local/lib/python3.7/site-packages/cudf/core/column_accessor.py", line 121, in __init__
    data = dict(data)
  File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 1791, in <genexpr>
    data_columns = (col.shift(offset, fill_value) for col in self._columns)
  File "/usr/local/lib/python3.7/site-packages/cudf/core/column/column.py", line 391, in shift
    return libcudf.copying.shift(self, offset, fill_value)
  File "cudf/_lib/copying.pyx", line 633, in cudf._lib.copying.shift

I can't seem to understand why this is happening.

Shawn Brar
  • 1,346
  • 3
  • 17

1 Answers1

0

Thanks for your min repro; works fine with a minor change. Don't .compute() dask too early. If you need to do something and continue processing in dask/dask_cudf, use please use .persist()

import pandas as pd
import numpy as np
import dask_cudf
import cudf


data = {"x":range(1,21), "nor":np.random.normal(2, 4, 20), "unif":np.random.uniform(size = 20)}
df = cudf.DataFrame(data)
ddf = dask_cudf.from_cudf(df, npartitions = 2)
colz = ["nor", "unif"]
ddf[[s + "_" + str(1) for s in colz]] = ddf[colz].shift(1)
ddf[[s + "_" + str(2) for s in colz]] = ddf[colz].shift(2)
ddf[[s + "_" + str(3) for s in colz]] = ddf[colz].shift(3)
ddf[[s + "_" + str(5) for s in colz]] = ddf[colz].shift(5)
ddf.compute()

output

    x   nor unif    nor_1   unif_1  nor_2   unif_2  nor_3   unif_3  nor_5   unif_5
0   1   3.711132    0.021615    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
1   2   -2.465054   0.081927    3.711131915 0.021614727 <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
2   3   1.543548    0.481731    -2.465054359    0.081927168 3.711131915 0.021614727 <NA>    <NA>    <NA>    <NA>
3   4   8.820771    0.040135    1.543548323 0.481731194 -2.465054359    0.081927168 3.711131915 0.021614727 <NA>    <NA>
4   5   0.233656    0.135811    8.82077073  0.040135259 1.543548323 0.481731194 -2.465054359    0.081927168 <NA>    <NA>
5   6   2.526556    0.360873    0.23365638  0.135810979 8.82077073  0.040135259 1.543548323 0.481731194 3.711131915 0.021614727
6   7   2.799205    0.383579    2.526555817 0.360873336 0.23365638  0.135810979 8.82077073  0.040135259 -2.465054359    0.081927168
7   8   5.960305    0.362417    2.799205226 0.383579063 2.526555817 0.360873336 0.23365638  0.135810979 1.543548323 0.481731194
8   9   1.878898    0.609364    5.960304782 0.362416925 2.799205226 0.383579063 2.526555817 0.360873336 8.82077073  0.040135259
9   10  1.217635    0.041408    1.878898482 0.609364119 5.960304782 0.362416925 2.799205226 0.383579063 0.23365638  0.135810979
10  11  0.580250    0.128405    1.21763458  0.04140812  1.878898482 0.609364119 5.960304782 0.362416925 2.526555817 0.360873336
11  12  4.907322    0.708164    0.580249571 0.128405085 1.21763458  0.04140812  1.878898482 0.609364119 2.799205226 0.383579063
12  13  6.591673    0.105310    4.907321929 0.708164063 0.580249571 0.128405085 1.21763458  0.04140812  5.960304782 0.362416925
13  14  -2.974896   0.587859    6.591673409 0.105310053 4.907321929 0.708164063 0.580249571 0.128405085 1.878898482 0.609364119
14  15  2.284847    0.978458    -2.974896021    0.587858754 6.591673409 0.105310053 4.907321929 0.708164063 1.21763458  0.04140812
15  16  -5.616458   0.114736    2.28484689  0.97845785  -2.974896021    0.587858754 6.591673409 0.105310053 0.580249571 0.128405085
16  17  -3.003533   0.279865    -5.616457873    0.114736009 2.28484689  0.97845785  -2.974896021    0.587858754 4.907321929 0.708164063
17  18  0.241106    0.923462    -3.003532592    0.279864688 -5.616457873    0.114736009 2.28484689  0.97845785  6.591673409 0.105310053
18  19  -2.100202   0.613850    0.241106056 0.923462497 -3.003532592    0.279864688 -5.616457873    0.114736009 -2.974896021    0.587858754
19  20  8.364832    0.929587    -2.100201941    0.613850209 0.241106056 0.923462497 -3.003532592    0.279864688 2.28484689  0.97845785
​
TaureanDyerNV
  • 1,208
  • 8
  • 9