I am totally new to Ray and have a question regarding it being a potential solution.
I am optimising an image modelling code and have successfully optimised it to run on a single machine, using multi-threaded numpy operations.
Each image generation is a serial operation, which scales across a single node.
What I’d like to do is scale each of these locally parallel jobs across multiple nodes.
Before refactoring, the code was parallelised serially at a high level, calculating single images in parallel. I would like to replicate this parallel behaviour again, across multiple nodes. Essentially this would be batch running a number of independent jobs which compute a single image in parallel across multiple nodes, where those computations themselves are independent of each other, the only communication requirement is sending parameters at the beginning (small) and image arrays at the end (large).
As mentioned the original parallel implementation used joblib to parallelise the serial image computation over cpus locally, with each image calculation on a separate cpu. Now, I want to replicate this, except with one image calculation process per node, which will them multi thread scale across that compute node.
So my idea is try the joblib backend for to control this process. This is the previous high level Joblib call for running multiple serial image computation in parallel.
I believe I can just encapsulate the above call with:
with joblib_backend(‘ray’):
The above loop is actually being called inside a method of a class, and the image computation uses the class self construct to pass around variables and arrays. Is there anything I have to do with actors to preserve this state?
Any thoughts or pointers would be greatly appreciated.