Python ray code working slower as compared to python multi-processing

Question

I am wanting to issue http requests in parallel and here is how my code (skeleton) looks like when using ray:

@ray.remote
def issue_request(user_id):
    r = requests.post(url , json, headers)

 ray.get([issue_request.remote(id_token[user_id]) for _ in range(500)])

This is running much slower as compared to the following:

def issue_request(user_id):
    r = requests.post(url , json, headers)

jobs = []
for i in range(500):
    process = multiprocessing.Process(target=issue_request,
                                  args=(admin_id))
jobs.append(process)
for j in jobs:
    j.start()

# Ensure all of the processes have finished
for j in jobs:
    j.join()

The machine has two cores and it seems that ray only starts two processes to handle the 500 requests. Can someone please tell me how to tell ray to start 1 worker/process per request?

You can do `ray.init(num_cpus=10)` to tell Ray to schedule up to 10 tasks concurrently. Starting 500 processes simultaneously would be probably be excessive. In the multiprocessing case, the processes are exiting once they finish, so you probably never have 500 around concurrently. — Robert Nishihara, May 16 '19 at 15:42

score 4 · Answer 1 · answered May 16 '19 at 15:43

You can do ray.init(num_cpus=10) to tell Ray to schedule up to 10 tasks concurrently. There is more information about resources in Ray at https://ray.readthedocs.io/en/latest/resources.html.

By default Ray will infer the number of cores using something like os.cpu_count().

Starting 500 processes simultaneously would be probably be excessive. In the multiprocessing case, the processes are exiting once they finish, so you probably never have 500 around concurrently.

Python ray code working slower as compared to python multi-processing

1 Answers1