3

I have a remote class like (using ray API)

@ray.remote
class className
   ....

And I want to start 60 or more instances of this class and let them do some work simultaneously.

However, I can't start more than 50 instances of this class at the same time.

How can I change the max. number of threads allowed at any given time from inside of the python script?

Our
  • 986
  • 12
  • 22
  • Hey. Did you find a way to solve this? The answer below seems not solving it! – Yahya Dec 06 '21 at 12:51
  • @Yahya unfortunately not. sorry. – Our Dec 06 '21 at 20:30
  • Same is here. I am running it on Windows 10, if you're doing the same, then you need to know that this seems to be a bug on Windows OS in particular – Yahya Dec 06 '21 at 23:30
  • @Our why can't you run more than 50 instances? How do you try to execute it and what's the error message you get? – HagaiA Dec 26 '22 at 13:46

1 Answers1

1

I believe you need to use Custom Resources. I put a few links of interest below:

The idea is that you first provide a dictionary to the resources argument of ray.init. Each key of the dictionary corresponds to the name you give to a custom resource. The value associated with the key is the maximum resource number available for this specific resource; whichever value you put, you can think of it as representing 100%. It is usually helpful to put values that relate to the execution of specific tasks/actors. For example, in your case, you want to have 50 actors from the same class executing at the same time, so 50 makes the most sense.

ray.init(resources={'Custom': 50})

Now, resources is also an argument for @ray.remote. It similarly requires a dictionary, equivalent to the one provided to ray.init. So let's say you have your class definition:

@ray.remote
class MyClass(object):
    # Methods

You can limit the number of Actors concurrently executing for this class by providing a custom resource value which will compare to the one defined in ray.init. The value must be an integer, except if it is lower than one; dividing the value given in @ray.remote by the corresponding one in ray.init and multiplying by 100 gives you the percentage of this custom resource that each task/actor will require. In your case, you want to set a limit of 50 actors, and we set Custom to 50 in ray.init. Hence, if each Actor requires a value of Custom equals to 1, then only 50 Actors will be able to run at the same time.

@ray.remote(resources={'Custom': 1})
class MyClass(object):
    # Methods

No more than 50 actors of this class can now concurrently execute.

Patol75
  • 4,342
  • 1
  • 17
  • 28
  • "No more than 50 actors of this class can now concurrently execute.": But I want exactly the opposite of this; being able to run more than 50 actors concurrently. – Our Aug 26 '20 at 04:36
  • Oh, OK, well it seems I misunderstood your question then. Can you edit your question and better explain what you mean by "However, I can't start more than 50 instances of this class at the same time."? Thank you. – Patol75 Aug 26 '20 at 04:40
  • I'll try but in the mean time, this is the error what I am getting from your suggested solution: ... – Our Aug 26 '20 at 04:52
  • "worker.py:1047 -- The actor or task with ID ffffffffffffffffef0a6c220100 is pending and cannot currently be scheduled. It requires {Custom: 60.000000}, {CPU: 1.000000} for execution and {Custom: 60.000000}, {CPU: 1.000000} for placement, but this node only has remaining {memory: 1.220703 GiB}, {CPU: 3.000000}, {object_store_memory: 0.390625 GiB}, {node:10.18.235.95: 1.000000}, {Custom: 40.000000}. In total there are 0 pending tasks and 1 pending actors on this node. This is likely due to all cluster resources being claimed by actors." – Our Aug 26 '20 at 04:52
  • "To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster. You can ignore this message if this Ray cluster is expected to auto-scale" – Our Aug 26 '20 at 04:52
  • 1
    I am sorry, there was indeed a mistake in the values given to `Custom` in both `ray.init` and `@ray.remote`. It should run now. :) – Patol75 Aug 26 '20 at 05:51