I have a large file that I want to load using cudf.read_csv(). The file in question is too large to fit in a single gpu's memory, but still small enough to fit into cpu memory. I can load the file by pd.read_csv(), but it takes forever! In smaller (but still quite large) files, cudf.read_csv() is around 6-10x faster than pandas.
When using cudf.read_csv(), I notice that only 1 out of the 4 Tesla V100-DGXS available actually loads data. The rest sit idle. I image if all 4 were used, the file would fit into memory. How can I use all 4 gpu to load the file?
Note: I know I can use a hack like cudf.read_csv('file.csv', usecols=FIRST_n_COLS) and sequentially load batches of columns. While this would fit into memory, I would prefer a more elegant solution if possible.