Python: JIT for known bottlenecks

Question

Is there any way to use somehow use pypy just to compile one function and not for the rest of my python program?

I have a known bottleneck where I spend 99% of my CPU time (containing mostly integer shifts and XORs) and have optimized it as much as I can in Python. I don't want to have to write and maintain C libraries unless absolutely positively necessary.

Right now I'm using Anaconda Python which is the normal python with a bunch of libraries. I would use pypy except that I don't want to have to make sure that all the rest of my program works ok w/ pypy.

Is there a way to explicitly run a JIT only on one Python function?

edit: the function is a modular multiplication step in GF2 (Galois field)

https://bitbucket.org/jason_s/libgf2/src/a71a14a035468e703a7c24349be814419cdba2cb/src/libgf2/gf2.py?at=default

specifically:

def _gf2mulmod(x,y,m):
    z = 0
    while x > 0:
        if (x & 1) != 0:
            z ^= y
        y <<= 1
        y2 = y ^ m
        if y2 < y:
            y = y2
        x >>= 1
    return z

It needs to work for bigints so I'm not sure how to rewrite to be Cython-compatible.

I just tried numba's @autojit and it failed because it didn't know what variable types to use and assumed small integers. I can't seem to figure out how to tell it to use standard Python bigints.

Traceback (most recent call last):
  File "/Users/jason_s/Documents/python/libgf2/src/libgf2/gf2.py", line 440, in <module>
    dlog43 = GF2DiscreteLog(0x100000000065)
  File "/Users/jason_s/Documents/python/libgf2/src/libgf2/gf2.py", line 295, in __init__
    factors = _calculateFactors(poly)
  File "/Users/jason_s/Documents/python/libgf2/src/libgf2/gf2.py", line 264, in _calculateFactors
    if (e1 << m).value != 1:
  File "/Users/jason_s/Documents/python/libgf2/src/libgf2/gf2.py", line 379, in __lshift__
    return self._wrapraw(_gf2lshiftmod(self.value,k,self.poly))
  File "/Users/jason_s/Documents/python/libgf2/src/libgf2/gf2.py", line 195, in _gf2lshiftmod
    return _gf2mulmod(x,_gf2powmod(2,k,m),m)
  File "/Users/jason_s/Documents/python/libgf2/src/libgf2/gf2.py", line 189, in _gf2powmod
    z = _gf2mulmod(z,x,m)
  File "numbawrapper.pyx", line 193, in numba.numbawrapper._NumbaSpecializingWrapper.__call__ (numba/numbawrapper.c:3764)
OverflowError: value too large to convert to signed int

Write it in [Cython](http://docs.cython.org/src/tutorial/cython_tutorial.html) and compile it into a C extension. The resulting Cython code probably won't look much different than your current code. — Blender, Oct 13 '13 at 21:18
How long is the function? Could you post the code? There are many ways to get around bottlenecks, and the best way depends on what exactly you are doing. — Sven Marnach, Oct 13 '13 at 21:23
I tried numba, couldn't get it to work. Anaconda Python doesn't seem to have Cython support. Blah. :-( — Jason S, Oct 13 '13 at 21:49
What integer range do you need to support? Would 64 bits be enough? — Sven Marnach, Oct 13 '13 at 21:56
64 bits enough in some cases, not enough in others. I realize I can optimize if I put a bit limit on it, but then I lose when I exceed that limit. — Jason S, Oct 13 '13 at 21:59
So you need arbitrarily long ints? This seems to be arithmetic in the polynomial ring over GF(2) instead of in GF(2) itself, right? — Sven Marnach, Oct 13 '13 at 22:04
>polynomial ring over GF(2) -- yes, I think (sorry, my terminology brain cells are confused) — Jason S, Oct 13 '13 at 22:09
My bad -- Anaconda does include Cython (or at least pyximport) since numba depends on it. Trying Cython.... — Jason S, Oct 13 '13 at 22:13

score 7 · Accepted Answer · answered Oct 13 '13 at 21:36

7

What about using Cython instead? You could convert just this one function to cython syntax, which is than compiled directly to C or so. The syntax should be close enough to python itself, probably just add a few declarations of the right types.

answered Oct 13 '13 at 21:36

Bas Swinckels

18,095
3
45
62

1

OK, it worked once I installed numba (which includes Cython); the .pyx version of the function is about 2.5x faster than my Pure Python version; not bad, maybe less nice than I'd hope though. :-) – Jason S Oct 13 '13 at 22:21

score 6 · Answer 2 · 2013-10-13T21:52:08.050

No, you cannot run part of a Python program in PyPy and other parts in another Python- It's more than just a JIT, it has a completely different representation of objects and many other internals.

If your sole concern is not wanting to make sure the rest of the program works with PyPy, rest assured: Virtually all pure Python code works with PyPy, with the sole exception of CPython implementation details. Those are obscure, it's quite hard to accidentially write code that relies on most of them, and the others (e.g. files not being closed automatically as promptly) won't break most programs. Just try running your whole program with PyPy.

If there are other issues with PyPy, you might want to translate only this function into C and call it using ctypes or cffi. The annoying part would be hooking it up with Python (e.g. via an extension module), which is what ctypes and cffi do for you. You don't need a complete arbitary precision integer library, you only need an array of bits with a few very simple operations: testing the least significant bit, left/right shifting, less-than, and bitwise XOR. Each of those is only one trivial loop. And if the naive C implementation is still a bottleneck, you can probably vectorize all of these loops. You could probably also optimize the shifts to avoid copying anything at all.

I like this answer because it's worth pointing out that PyPy is very nearly a drop-in replacement for Python 2.7, so almost any pure Python 2 program will work using PyPy, with no changes whatsoever. What the OP doesn't make clear is how much he's depending on various modules or packages; if any of those are not pure Python (and this includes some standard library modules), then most likely they won't work with PyPy. (Indeed, most such packages won't even work with different versions of CPython.) That's when C, `ctypes`, and `cffi` come into the picture, which you've also mentioned. — John Y, Oct 17 '13 at 15:38
Per my original post: "I am using **Anaconda Python** which is the normal python with a bunch of libraries" (my emphasis). I can't depend on pypy supporting all these packages, and even if I could, I'm working with people who aren't clever about installation so I am using Anaconda because it makes the process easy. — Jason S, Oct 24 '13 at 21:34

John Y · Answer 3 · 2016-08-08T14:07:24.453

Cython has already been mentioned a few times, but I'd like to point out another alternative for those who find this question in their search, who might not have the arbitrary-precision integer requirement: ShedSkin. It translates Python to C++ via type inference. In other words, unlike Cython, you don't add type declarations to Python; you just write Python. But you do have to make sure your variables don't change type within your program, otherwise the type cannot be inferred. Certain other of the more dynamic features of Python are also not supported, but this is usually not an issue for computation-intensive code.

As for the arbitrary-precision integer requirement: You will likely get much, much bigger speed improvements than the 2.5x you noted in your comment, for integers that fit in "native" (fixed-size) integers. The improvement is so drastic that if a good portion of your calculations can be done with native integers, it is probably worth having a (blazingly fast) native-integer version of your function just for that case, and only using the general version of your function for values that don't fit. (Not all problems can be broken into separate cases like this, but for those that can, it's worth at least checking out this approach.)

score 1 · Answer 4 · answered Apr 23 '21 at 18:40

I actually created a python package that does exactly what you are trying to accomplish. It implements numpy arrays over Galois fields. I was able to optimize it by JIT compiling with numba. It also supports arbitrarily-large integers, like you mentioned. https://github.com/mhostetter/galois

In [1]: import numpy as np                                                                     

In [2]: import galois                                                                          

In [3]: GF = galois.GF(2**100)                                                                 

In [4]: print(GF.properties)                                                                   
GF(2^100):
  characteristic: 2
  degree: 100
  order: 1267650600228229401496703205376
  irreducible_poly: Poly(x^100 + x^57 + x^56 + x^55 + x^52 + x^48 + x^47 + x^46 + x^45 + x^44 + x^43 + x^41 + x^37 + x^36 + x^35 + x^34 + x^31 + x^30 + x^27 + x^25 + x^24 + x^22 + x^20 + x^19 + x^16 + x^15 + x^11 + x^9 + x^8 + x^6 + x^5 + x^3 + 1, GF(2))
  is_primitive_poly: True
  primitive_element: GF(2, order=2^100)

In [5]: A = GF.Random((2,2)); A                                                                
Out[5]: 
GF([[853109427014456778157610146134, 957579797461316189986596054496],
    [619061399056446662243502059294, 762244553049699515226100523552]],
   order=2^100)

In [6]: B = GF.Random((2,2)); B                                                                
Out[6]: 
GF([[511709585928908961018234228239, 206374347029181859532039074035],
    [795530021671674913470994904012, 918203712488921499667394325749]],
   order=2^100)

In [7]: A + B                                                                                  
Out[7]: 
GF([[1005796869832339943233379227481, 1152773228746217240881872950547],
    [1097497412292991510222532386002, 160953539027946640002225213141]],
   order=2^100)

In [8]: A * B                                                                                  
Out[8]: 
GF([[296314095771552265037299061152, 688536035673482273067277820628],
    [1177970297984569800118703939222, 537328370564266356643331706738]],
   order=2^100)

In [9]: A / A                                                                                  
Out[9]: 
GF([[1, 1],
    [1, 1]], order=2^100)

# Fermat's Little Theorem
In [10]: A ** (GF.order - 1)                                                                   
Out[10]: 
GF([[1, 1],
    [1, 1]], order=2^100)

In [11]: A @ B                                                                                 
Out[11]: 
GF([[1027776659503691614378629238339, 470187664463292378234435322369],
    [86178777179053209582733631256, 172677144553521647820627674227]],
   order=2^100)

In [12]: np.linalg.inv(A) @ A                                                                  
Out[12]: 
GF([[1, 0],
    [0, 1]], order=2^100)

Python: JIT for known bottlenecks

4 Answers4