How to process data larger than GPU Memory using BlazingSQL

Question

I am trying to run a sql query with a 50 GB CSV file but my GPU Memory is of only 40GB. How can I do the processing?
Also I am only able to run blazingsql with the jupyter notebook available with their docker image, can anyone please help me how to install it locally?

As it is not being possible with the conda command available on their github.

Nick Becker · Answer 1 · 2022-04-06T19:27:15.533

1

One way to do this today is to use Dask-SQL. Because it's built on Dask, Dask-SQL inherits Dask's ability to handle larger-than-memory workloads.

The easiest way to install Dask-SQL and use GPUs is to create a conda environment or pull a Docker container using the RAPIDS release selector.

edited Apr 06 '22 at 19:27

answered Apr 06 '22 at 19:19

Nick Becker

4,059
13
19

I have read that dask-sql is still under development, so I think we should not use it in production, right? – Soumya Bhattacharjee Apr 07 '22 at 04:25
While most analytics tools are always under development, you are right that Dask-SQL is less mature. If you need a more production-ready accelerated SQL solution, you may be interested in [Spark RAPIDS](https://nvidia.github.io/spark-rapids/) – Nick Becker Apr 07 '22 at 12:58
Yes I am comparing the results between spark-rapids and blazingsql. Blazingsql is much much faster than spark-rapids. But the problem is we have to implement this in our big-data production, where we deal with over 10TB data and it is not possible for us to allocate that much GPUs that we can fit 10TB data in GPU memory. Can you give me some idea why spark-rapids is slower compared to Blazingsql where both uses rapids? – Soumya Bhattacharjee Apr 08 '22 at 06:45

How to process data larger than GPU Memory using BlazingSQL

1 Answers1