2

PCIE bus bandwidth latencies force constraints on how and when applications should copy data to and from GPUs.

When working with cuDF directly, I can efficiently move a single large chunk of data into a single DataFrame.

When using dask_cudf to partition my DataFrames, does Dask copy partitions into GPU memory one at a time? In batches? If so, is there significant overhead from multiple copy operations instead of a single larger copy?

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
Randy Gelhausen
  • 125
  • 1
  • 5

1 Answers1

1

This probably depends on the scheduler you're using. As of 2019-02-19 dask-cudf uses the single-threaded scheduler by default (cudf segfaulted for a while there if used in multiple threads), so any transfers would be sequential if you're not using some dask.distributed cluster. If you're using a dask.distributed cluster, then presumably this would happen across each of your GPUs concurrently.

It's worth noting that dask.dataframe + cudf doesn't do anything special on top of what cudf would do. It's as though you called many cudf calls in a for loop, or in one for-loop per GPU, depending on the scheduler choice above.

Disclaimer: cudf and dask-cudf are in heavy flux. Future readers probably should check with current documentation before trusting this answer.

MRocklin
  • 55,641
  • 23
  • 163
  • 235