4

I don't think I fully understand the relationship between object_store_memory and redis_max_memory. I initially thought that the redis db holds all the objects in memory, but then how can the object store size be made larger than the redis_max_memory size? Or are some parts of it written to disk? How is RAM consumed? Is RAM_CONSUMED = object_store_memory + redis_max_memory, or is it the union of it?

Muppet
  • 5,767
  • 6
  • 29
  • 39

1 Answers1

5

TL;DR: Yes, RAM_CONSUMED = object_store_memory + redis_max_memory, unless you back the object store by disk (allow for object spilling to disk).


More info

The redis DB only holds the metadata about objects and tasks (i.e. for tasks: which objects the task depends on and the IDs of the output it produces, which function needs to be run to produce the output; for objects: on which node(s) in the cluster is the object stored). The redis_max_memory restricts the size of this database, old entries are discarded in LRU fashion.

The actual data is stored in a shared memory object store (see https://arrow.apache.org/docs/python/plasma.html), the size of which is limited by object_store_memory. Again old objects that are not currently mapped into any worker memory are going to be evicted in LRU order from there.

It is also possible to back the object store with disk by providing the _plasma_directory argument or --plasma-directory switch (see docs of ray.init()). This allows to have object stores that are larger than the size of your RAM, but it will also make the object store slower, depending on the disk type and size and the amount of buffer cache.

Let me know if you have more questions.

mirekphd
  • 4,799
  • 3
  • 38
  • 59
Philipp Moritz
  • 241
  • 3
  • 3
  • Thank you for that. What is a reasonable size for the redis DB? If it only holds metadata, it should suffice to keep it quite small? – Muppet Jun 19 '19 at 02:10
  • I'd say 10^9 (bytes) should be sufficient for most workloads. The metadata for tasks/objects is a few hundred bytes, so 10^9 allows you to store a few million tasks/objects at a time, which should be plenty. – Philipp Moritz Jun 19 '19 at 20:04
  • Hi, wonder answer, normally what is a reasonable size for `object_store_memory`? Is there any rule of thumb? – GoingMyWay May 04 '20 at 05:58
  • I also fund `driver_object_store_memory` in Ray, any suggestions on setting its value? – GoingMyWay May 04 '20 at 06:01