Cudf only using single gpu to load data

Question

I have a large file that I want to load using cudf.read_csv(). The file in question is too large to fit in a single gpu's memory, but still small enough to fit into cpu memory. I can load the file by pd.read_csv(), but it takes forever! In smaller (but still quite large) files, cudf.read_csv() is around 6-10x faster than pandas.

When using cudf.read_csv(), I notice that only 1 out of the 4 Tesla V100-DGXS available actually loads data. The rest sit idle. I image if all 4 were used, the file would fit into memory. How can I use all 4 gpu to load the file?

Note: I know I can use a hack like cudf.read_csv('file.csv', usecols=FIRST_n_COLS) and sequentially load batches of columns. While this would fit into memory, I would prefer a more elegant solution if possible.

score 0 · Accepted Answer · answered Sep 18 '20 at 19:52

0

If you have multiple GPUs, and want to use all of them at once, please use dask_cudf. RAPIDS has a few guides for this, but @Nick Becker did a great job explaining it here: https://stackoverflow.com/a/58123478/1309051. That will get you on your way

answered Sep 18 '20 at 19:52

TaureanDyerNV

1,208
8
9

Thanks for the reference! That was what I was looking for. – Ottpocket Sep 22 '20 at 13:43

Cudf only using single gpu to load data

1 Answers1