0

I'd like to use this package as data backend to expose an api/website with data analysis

How parallelization is done in this package ? is it possible to control the resources consumed ?

Br

Vaidehi Jamankar
  • 1,232
  • 1
  • 2
  • 10
Devyl
  • 565
  • 3
  • 8

1 Answers1

2

is it possible to control the resources consumed ?

You can set the POLARS_MAX_THREADS env var. This will be used to initiate the thread pools size on startup.

See all configurable env vars here: https://pola-rs.github.io/polars/polars/index.html#config-with-env-vars

How parallelization is done in this package ?

Can you be more specific? I can only say it depends...

ritchie46
  • 10,405
  • 1
  • 24
  • 43
  • Do you chunk a dataframe with 100 rows in like 10 partitions of 10 rows and execute a method on each block like spark and collect/merge results, or do you proceed by column (entirely or on each chunk of column) ? – Devyl May 24 '22 at 13:52
  • It all depends on the algorithm, sometimes we partition vertically, very often we partition horizontally. But per operation we decide what's the most natural partitioning is for our sata/algorithm and work overhead. That's why it depends. ;) Our user guide might give better intuition of where we parallelize. – ritchie46 May 24 '22 at 15:13