1

I want to understand the efficient memory management process for Dask objects. I have setup a Dask GPU cluster and I am able to execute tasks that runs across the cluster. However, with the dask objects, especially when I run the compute function, the process that runs on the GPU is quickly growing by using more and more of the memory and soon I am getting "Run out of memory Error".

I want to understand how I can release the memory from dask object once I am done with using them. In this following example, after the compute function how can I release that object. I am running the following code for a few times. The memory keeps growing in the process where it is running

import cupy as cp
import pandas as pd
import cudf
import dask_cudf

nrows = 100000000
df2 = cudf.DataFrame({'a': cp.arange(nrows), 'b': cp.arange(nrows)})
ddf2 = dask_cudf.from_cudf(df2, npartitions=5)
ddf2['c'] = ddf2['a'] + 5
ddf2

ddf2.compute()

1 Answers1

0

Please check this blog post by Nick Becker. you may want to set up a client first.

You read into cudf first, which you shouldn't do as practice. You should read directly into dask_cudf.

When dask_cudf computes, the result returns as a cudf dataframe, which MUST fit into the remaining memory of your GPU. Chances are reading into cudf first may have taken a chunk of your memory.

Then, you can delete a dask object when you are done using client.cancel().

TaureanDyerNV
  • 1,208
  • 8
  • 9
  • Thanks for the response. I am already using the client and I have no issues in bringing it up to execute the code Regarding directly reading into the dask_cudf, do we have a function that can create Dataframe by using a list. The examples that I have seen so far are using the cudf and then is used for dask_cudf. Like in the example I shared above and in the link below https://github.com/rapidsai/dask-cudf/blob/branch-0.9/dask_cudf/tests/test_core.py Also I have tried deleting object with client.cancel(). Even if I delete the object the size of GPU usage memory is going down in process – Newbie_Python May 06 '21 at 19:05
  • I see that you're using some old documentation. That is RAPIDS version 0.9. 0.19 is our most recent stable. :) You should also expect the GPU memory to reduce, but going out of memory, you are doing one of three things: 1. your resultant dataframe is too big. 2. you may require split_out or rmm. 3. you need more GPU memory. What GPUs are you using? – TaureanDyerNV May 24 '21 at 00:30