7

I'm trying to capitalize on Ray's parallelization model to process a file record by record. The code works beautifully, but the object storage grows fast and ends up crashing. I'm avoiding using ray.get(function.remote()) because it kills the performance because the task is composed of a few million subtasks and the overhead of waiting for a task to finish. Is there a way to set a global limit to the object storage?

#code which constantly backpressusre the obejct storage, freeing space, but causes performance to be worse than serial execution
for record in infile:
    ray.get(createNucleotideCount.remote(record, copy.copy(dinucDict), copy.copy(tetranucDict),dinucList,tetranucList, filename))

#code that maximizes throughput but makes the object storage grow constantly
for record in infile:
    createNucleotideCount.remote(record, copy.copy(dinucDict), copy.copy(tetranucDict),dinucList,tetranucList, filename)

#the called function returns either 0 or 1.
LipeScia
  • 71
  • 1
  • 4

1 Answers1

8

You can do ray.init(object_store_memory=10**9) to limit the object store to use 1 GB of your system RAM (as opposed to all of it by default).

object_store_memory – The amount of memory (in bytes) to start the object store with. By default, this is automatically set based on available system memory.

(see docs on ray.init())

There is more information in the documentation on memory management at https://docs.ray.io/en/releases-1.11.0/ray-core/memory-management.html.

mirekphd
  • 4,799
  • 3
  • 38
  • 59
Robert Nishihara
  • 3,276
  • 16
  • 17