0

I am trying to read 12 GB CSV file.
If I am trying to read with CUDF it is giving a memory error

MemoryError: std::bad_alloc: CUDA error at: /usr/local/envs/bsql/include/rmm/mr/device/cuda_memory_resource.hpp:69: cudaErrorMemoryAllocation out of memory

But when I try to read with dask_cudf with LocalCUDACluster it is not giving any memory issue.
My question is if both are using a single GPU then why one is having a memory issues and other is not?

jarlh
  • 42,561
  • 8
  • 45
  • 63
  • `dask_cudf.read_csv` returns a lazy object that has not yet been evaluated-- do you still see this memory error when you call `compute()`? – scj13 Mar 16 '22 at 23:36
  • I do not see memory error in dask cudf even after compute() I had run df.count().compute() and it executed perfectly The issue is only with cudf – Soumya Bhattacharjee Mar 17 '22 at 10:02

1 Answers1

1

In the same way Dask is able to take advantage of multiple cores on a single CPU, Dask-cuDF can take advantage of multiple cores on a single GPU. Therefore, if your dataset takes up more memory than you can fit in a single GPU, you should use Dask-cuDF instead of cuDF (see the RAPIDS docs here).

scj13
  • 306
  • 1
  • 5
  • But my dataset is only 12 GB and GPU memory is 40 GB. – Soumya Bhattacharjee Mar 21 '22 at 05:17
  • Though the size of the CSV is 12 GB on disk, it will be much larger when loaded in memory. Exactly how much is dataset-specific, but [here is one comparison as an example](https://stackoverflow.com/a/31543407/17015034). Dask-cuDF doesn't need to load the entire dataset in memory at once as cuDF does. – scj13 Mar 21 '22 at 17:43