1

Is there a way to import modules from a CPython script, but run them in PyPy?

The problem is that I have a code that uses lots of SciPy (and NumPy), but there are parts of the code that could be optimized with PyPy.

Here's a random example of what I would like to do:

sub_run_in_pypy.py module:

#assuming this can be optimized with PyPy
def function_a(foo):
    return foo**2

main_run_in_cpython.py module:

import scipy.stats as stats

#assuming this exists:
import import_function_for_pypy
pypy_imported_function = import_function_for_pypy(module_name=sub_run_in_pypy, function_name=function_a)

x = stats.t.rvs(5, loc=0, scale=1, size=1)

print pypy_imported_function(x)

If this does not exist, why not?

Edit: As Bakuriu inferred, I was suggesting it could potentially be something that runs in a separate process. Would this add too much overhead?

TimY
  • 5,256
  • 5
  • 44
  • 57
  • I don't understand. Do you want to run some code as CPython and some code as PyPy? Then you must use two different processes, one with PyPy and one with CPython, but this will add a lot of overhead to communicate between processes. – Bakuriu Apr 08 '13 at 11:47
  • Yes, that's what I was suggesting - would the overhead really reduce the performance that much? (I'm sorry that was unclear) – TimY Apr 08 '13 at 11:49
  • It depends. Multiprocessing allows you to add some parallel computation which can increase the speed of the computation on multicore machines, but you must be careful to avoid many little communications between processes. Avoid sending small "information packets", try to do batch communications as much as possible. We can't say anything more without the specific code. You should probably try and *profile* (at least with some sample code). – Bakuriu Apr 08 '13 at 12:10
  • I see. So you're suggesting this is not a mainstream practice due to the the fact that the efficiency is code-specific. Unfortunately, I cannot post the code since it is too long and complex, but thank you for the offer. I will have a look at what I can do. – TimY Apr 08 '13 at 12:59
  • 1
    Generally you have to consider that every communication between processes is like 1000 or more times slower then normal operations. Depending on the task it might or not be possible to "dilute" these heavy operations among the others to obtain a speed up. Other tasks are intrinsecally serial and, no matter what, you wont obtain big speed ups with parallelization. So, yes, it highly depends on your problem.There are some conventional ways of doing this, e.g. pipes, RPCs,signals or sockets, so you ought to take a look at the [standard library](http://docs.python.org/2/library/ipc.html). – Bakuriu Apr 08 '13 at 14:50

1 Answers1

0

Since I stumbled over this question long before the other thread, the reverse mechanism is described briefly here (Can I embed CPython inside PyPy?).

The basic idea is to start a PyPy interpreter alongside the CPython interpreter (or vise versa) and connect them via inter process communication. While you might be tempted to try to do this via pipes or sockets, it is highly recommended to use a higher-level library, such as execnet (which is actually used for this purpose).

If you go for a low-level approach, be sure to decide early if and how you can handle architectures such as multi-thread or multi-process execution, whether you want asynchronous computation or even master-worker setups.

Community
  • 1
  • 1
MisterMiyagi
  • 44,374
  • 10
  • 104
  • 119