I’m trying to figure out whether Ray will work for an application, and I’m trying to understand how dependencies get to the workers in a Ray cluster. Ex: let’s say I have
@ray.remote
def foo():
a = do_something_requiring_pandas()
b = do_something_requiring_openmpi()
return a + b
How do I make sure the workers have access to pandas (a third party python package) and openmpi (a non-python package usually installed via the OS package manager)? Do I have to just ensure that the workers have them installed “out of band” from Ray? Or does Ray do some automagic packaging of dependencies that it sends to the worker along with the task (I can see how that could work in the pandas case, but not the openmpi one)? I don’t actually care about pandas or openmpi specifically, they’re just handy examples of two different categories of dependency.