cuDF - Not leveraging GPU cores

Question

I am the below piece of code in python with cuDF to speed up the process. But I do not see any difference in the speed when compared to my 4 core local machine cpu. GPU configuration is 4 x NVIDIA Tesla T4

def arima(train):
    h = []
    for each in train:
        model = pm.auto_arima(np.array(ast.literal_eval(each)))
        p = model.predict(1).item(0)
        h.append(p)
    return h


for t_df in pd.read_csv("testset.csv",chunksize=1000):
    t_df = cudf.DataFrame.from_pandas(t_df)
    t_df['predicted'] = arima(t_df['prev_sales'])

What I am missing here?

GPU calculations consist of two parts: sending data to GPU and calculations themselves. Both take some time. For simple calculations like yours it's even possible that calculations on GPU will take even more time than on CPU. — Sergey Bushmanov, Apr 21 '20 at 14:24
I have 2 million records, so I opted for GPU. In this case, how can I utilize max performance with GPU @SergeyBushmanov — Jack Daniel, Apr 21 '20 at 14:38
Any code level changes required to utilize the max performance of GPU @SergeyBushmanov — Jack Daniel, Apr 21 '20 at 14:40

TaureanDyerNV · Answer 1 · 2020-04-28T22:31:01.583

While, i'll help you with your issue of not accessing all the GPUs, I'll share with you a performance tip: If all your data fits on a single GPU, then you should use stick with single GPU processing using cudf as it is much faster, as it doesn't require any orchestration overhead. If not, then read on :)

The reason why you're not utilizing the 4 GPUs is because you're not using dask-cudf. cudf is a single GPU library. dask-cudf allows you to scale it out to multiple gpus and multiple nodes, or process datasets with "larger than GPU memory" sizes.

Here is a great place to start: https://docs.rapids.ai/api/cudf/stable/10min.html

As for your speed issue, you should be reading the CSV directly into GPU through cudf, if possible. In your code, you're reading the data twice - once to host [CPU] with pandas and once to cudf [GPU] from pandas. It's unnecessary - and you lose all the benefits of GPU acceleration on read. On large datasets, cudf will give you a pretty nice file read speedup compared to pandas.

import dask_cudf
df = dask_cudf.read_csv("testset.csv", npartitions=4) # or whatever multiples of the # of GPUs that you have

and then go from there. Be sure to set up a client. https://docs.rapids.ai/api/cudf/stable/10min.html#Dask-Performance-Tips. This information is also found in that link, which is in the same page linked as above. No for loops required :).

For the rest of it, I am assuming that you're using the cuml for your machine learning algos, like ARIMA. https://docs.rapids.ai/api/cuml/stable/api.html?highlight=arima#cuml.tsa.ARIMA. Here is an example notebook: https://github.com/rapidsai/cuml/blob/branch-0.14/notebooks/arima_demo.ipynb

cuDF - Not leveraging GPU cores

1 Answers1

Linked