Dask-cudf with single GPU

Question

I am trying to read 12 GB CSV file.
If I am trying to read with CUDF it is giving a memory error

MemoryError: std::bad_alloc: CUDA error at: /usr/local/envs/bsql/include/rmm/mr/device/cuda_memory_resource.hpp:69: cudaErrorMemoryAllocation out of memory

But when I try to read with dask_cudf with LocalCUDACluster it is not giving any memory issue.
My question is if both are using a single GPU then why one is having a memory issues and other is not?

`dask_cudf.read_csv` returns a lazy object that has not yet been evaluated-- do you still see this memory error when you call `compute()`? — scj13, Mar 16 '22 at 23:36
I do not see memory error in dask cudf even after compute() I had run df.count().compute() and it executed perfectly The issue is only with cudf — Soumya Bhattacharjee, Mar 17 '22 at 10:02

score 1 · Answer 1 · answered Mar 17 '22 at 17:34

1

In the same way Dask is able to take advantage of multiple cores on a single CPU, Dask-cuDF can take advantage of multiple cores on a single GPU. Therefore, if your dataset takes up more memory than you can fit in a single GPU, you should use Dask-cuDF instead of cuDF (see the RAPIDS docs here).

answered Mar 17 '22 at 17:34

scj13

306
1
5

But my dataset is only 12 GB and GPU memory is 40 GB. – Soumya Bhattacharjee Mar 21 '22 at 05:17
Though the size of the CSV is 12 GB on disk, it will be much larger when loaded in memory. Exactly how much is dataset-specific, but [here is one comparison as an example](https://stackoverflow.com/a/31543407/17015034). Dask-cuDF doesn't need to load the entire dataset in memory at once as cuDF does. – scj13 Mar 21 '22 at 17:43

Dask-cudf with single GPU

1 Answers1