0

I have a large file that I want to load using cudf.read_csv(). The file in question is too large to fit in a single gpu's memory, but still small enough to fit into cpu memory. I can load the file by pd.read_csv(), but it takes forever! In smaller (but still quite large) files, cudf.read_csv() is around 6-10x faster than pandas.

When using cudf.read_csv(), I notice that only 1 out of the 4 Tesla V100-DGXS available actually loads data. The rest sit idle. I image if all 4 were used, the file would fit into memory. How can I use all 4 gpu to load the file?

Note: I know I can use a hack like cudf.read_csv('file.csv', usecols=FIRST_n_COLS) and sequentially load batches of columns. While this would fit into memory, I would prefer a more elegant solution if possible.

molbdnilo
  • 64,751
  • 3
  • 43
  • 82
Ottpocket
  • 77
  • 12

1 Answers1

0

If you have multiple GPUs, and want to use all of them at once, please use dask_cudf. RAPIDS has a few guides for this, but @Nick Becker did a great job explaining it here: https://stackoverflow.com/a/58123478/1309051. That will get you on your way

TaureanDyerNV
  • 1,208
  • 8
  • 9