I've decided to target the creation of some test bench Python scripts that will exercise various algorithms developed by multiple developers under a variety of environments to all be exercised by IPython's parallel distributed cluster capabilities. One such environment of interest is Matlab, but the resource scenario is generic enough to abstract rules from this question. To invoke the prototyping Matlab scripts I've decided to use the matlab engine for Python. These engine objects will be instantiated in Python, but rather than spinning one up per work unit I'd like to reuse the Matlab engines if at all possible to avoid the overhead of creating and destroying those beefy Matlab engine objects. I'm still early in reading up on IPython parallel documentation but I thought I'd do a preemptive strike to get feedback so I can draw attention to specific parts of the documentation once I come across them. I may even be making the fallacious assumption that I require multiple Matlab engines to avoid synchronization or thread safety issues with a single Matlab engine instance. But the general question is if I have a resource that can and should be reused analogous to something tucked away in Thread Local Storage so that as tasks and jobs receive the next work unit they'll reuse that resource to accomplish their work. Once there's no more units of work that resource can be wound down and destroyed if necessary.
1 Answers
Haven't figured it all out entirely but I'll document here what I've found thus far.
So after the creation of the engines and connecting to them via a direct view from a client I can execute any ole python code in the embedded Python instance connected to the engine via the run method. This includes the creation of a matlab engine for that IPython engine...
rc = parallel.Client()
dview = rc[:]
for ipEngine in dview :
ipEngine.run( "import matlab.engine" )
ipEngine.run( "eng = matlab.engine.start_matlab('-noFigureWindows')" )
I still need to actually do this and figure out all the variable scoping rules in this distributed environment and how it relates back to the environment kept on the client and how to navigate the namespace conflicts that may arise but this looks promising as far as getting stuff set up in the engine's embedded python instance. I'll be evolving this answer as the answers roll in from my development.
There's also the pushing and pulling of objects to pass Python objects around although I'm not sure how this would work in practice with a matlab engine and the whole namespace thing or even with respect to the remote computer's environment and the requirement that matlab even be installed there.
There is a known limitation for the Matlab Engine for python worth stating here as well, which is found in Ch 8 "MATLAB Engine for Python Topics" of the Matlab External Interfaces pdf document:
Limitations to MATLAB Engine for Python
- The MATLAB Engine for Python is not thread-safe
-
looks like you could also use `with dview.sync_imports(): import matlab.engine` as an alternative way to push imports out to the clients. Would `ipEngine.execute()` do something different thatn `ipEngine.run()`? Was looking at [this post from MinRK](https://minrk.github.io/scipy-tutorial-2011/basic_remote.html#execute-and-run). – Roland Jun 11 '15 at 22:46
-
That's a good technique too, I've yet to figure out how it relates to Matlab installations on my cluster, I'm hoping to avoid if possible having to get licenses for all cluster machines or at least minimize the number of licenses needed. I know IPython would push functions over to engines through the zeroMQ protocol, but if those function have internal needs that are only satisfied on the client, I don't imagine the full dependencies of a function are pushed to the remote engines as well. Unfortunately I have a bunch of refactoring prerequisites to do to the Matlab before I start experimenting – jxramos Jun 12 '15 at 20:36
-
I agree that you won't be able to get access to multiple, simultaneous instances of a license-controlled software (it'll either be node-locked or the license manager will refuse to turn on extra instances). – Roland Jun 15 '15 at 20:01