0

Why didn't the the commit 7138f470f0e55f2ebdb7638ddc4dfe2e78671403 trigger a new major version of dask since the function read_metadata is incompatible with older versions? The commit introduced the return of 4 values, but the old version only returned 3. According to semantic versioning this would have been the correct behavior.

cudf got broken, because of that commit.

Code from the issue:

>>> import cudf
>>> import dask_cudf
>>> dask_cudf.from_cudf(cudf.DataFrame({'a':[1,2,3]}),npartitions=1).to_parquet('test_parquet')

>>> dask_cudf.read_parquet('test_parquet')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nvme/0/vjawa/conda/envs/cudf_15_june_25/lib/python3.7/site-packages/dask_cudf/io/parquet.py", line 213, in read_parquet
    **kwargs,
  File "/nvme/0/vjawa/conda/envs/cudf_15_june_25/lib/python3.7/site-packages/dask/dataframe/io/parquet/core.py", line 234, in read_parquet
    **kwargs
  File "/nvme/0/vjawa/conda/envs/cudf_15_june_25/lib/python3.7/site-packages/dask_cudf/io/parquet.py", line 17, in read_metadata
    meta, stats, parts, index = ArrowEngine.read_metadata(*args, **kwargs)
ValueError: not enough values to unpack (expected 4, got 3)

dask_cudf==0.14 is only compatible with dask<=0.19. In dask_cudf==0.16 the issue is fixed.

Edit: Link to the issue

JulianWgs
  • 961
  • 1
  • 14
  • 25

1 Answers1

1

Whilst Dask does not have a concrete policy around the value of the version string, one could argue that in this particular case, the IO code is non-core, and largely being pushed by upstream (pyarrow) development rather than our own initiative.

We are sorry that your code broke, but of course picking the correct versions of packages and expecting downstream packages to catch up is part of the open source ecosystem.

You may want to raise this as a github issue, if you'd like to get input from more of the dask maintenance team. (There isn't really much to "answer" here, from a stackoverflow perspective)

mdurant
  • 27,272
  • 5
  • 45
  • 74
  • Thank you for the fast response! Will do. Thought I would ask here first in case I've missed something. – JulianWgs Aug 17 '20 at 18:06