Polars meaning of parallelization?

Question

I'd like to use this package as data backend to expose an api/website with data analysis

How parallelization is done in this package ? is it possible to control the resources consumed ?

Br

score 2 · Answer 1 · answered May 24 '22 at 07:44

2

is it possible to control the resources consumed ?

You can set the POLARS_MAX_THREADS env var. This will be used to initiate the thread pools size on startup.

How parallelization is done in this package ?

Can you be more specific? I can only say it depends...

answered May 24 '22 at 07:44

ritchie46

Do you chunk a dataframe with 100 rows in like 10 partitions of 10 rows and execute a method on each block like spark and collect/merge results, or do you proceed by column (entirely or on each chunk of column) ? – Devyl May 24 '22 at 13:52
It all depends on the algorithm, sometimes we partition vertically, very often we partition horizontally. But per operation we decide what's the most natural partitioning is for our sata/algorithm and work overhead. That's why it depends. ;) Our user guide might give better intuition of where we parallelize. – ritchie46 May 24 '22 at 15:13

1 Answers1