Trying to calculate large numbers in Python with gmpy. Python keeps crashing?

Question

I was recommended to use gmpy to assist with calculating large numbers efficiently. Before I was just using python and my script ran for a day or two and then ran out of memory (not sure how that happened because my program's memory usage should basically be constant throughout.. maybe a memory leak?)

Anyways, I keep getting this weird error after running my program for a couple seconds:

mp_allocate< 545275904->545275904 >
Fatal Python error: mp_allocate failure

This application has requested the Runtime to terminate it in an unusual way. 
Please contact the application's support team for more information.

Also, python crashes and Windows 7 gives me the generic python.exe has stopped working dialog.

This wasn't happening with using standard python integers. Now that I switch to gmpy I am getting this error just seconds in to running my script. I thought gmpy was specialized in dealing with large number arithmetic?

For reference, here is a sample program that produces the error:

import gmpy2

p = gmpy2.xmpz(3000000000)
s = gmpy2.xmpz(2)
M = s**p

for x in range(p):
    s = (s * s) % M

I have 10 gigs of RAM and without gmpy this script ran for days without running out of memory (still not sure how that happened considering s never really gets larger..

Anyone have any ideas?

EDIT: Forgot to mention I am using Python 3.2

@RichieHindle is right; also, if you're using Python 2, use `xrange` instead of `range`. In Python 2, `range` returns a list, which in this case would contain 3,000,000,000 elements, a bit much even for 10 GB of RAM. — Tom Zych, Oct 03 '11 at 22:18
@TomZych: I'm using Python 3 and apparently xrange does not exist? — Ryan Peschel, Oct 03 '11 at 22:48
Then it's the answer in @RichieHindle's comment, which has unaccountably disappeared. He pointed out that `2**3000000000` is probably too big to fit in your machine's address space. Try it with one billion instead and see if that works. — Tom Zych, Oct 03 '11 at 22:52
@Tom Zych: I removed the comment because it was technically wrong (something like "2^3G won't fit in a 32-bit address space", which is actually not true - I was muddling bits and bytes). But it seems the problem *is* due to the "small" address space of 32-bit Python. — RichieHindle, Oct 04 '11 at 09:08

score 7 · Accepted Answer · edited Oct 07 '12 at 09:34

Disclaimer: I'm the maintainer of gmpy and gmpy2.

I won't be able to test this until tonight. But a couple of comments and questions.

Instead of using (s * s) % M, use pow(s, 2, M). It should be faster.

What happens if you use gmpy2.mpz() instead of gmpy2.xmpz() ?

Are you running a 64-bit version of Python and gmpy2? (I assume so, but I just want to confirm.)

Regarding range vs. xrange, in Python 3.x, range has replaced xrange.

Edit with additional information.

The cause of the crash was due to overflow of internal structures in a 32-bit build. Using a 64-bit version of Python and gmpy or gmpy2 is the proper fix.

The unpack(x,n) function is similar to split() for a string: it divides a number into a series of n-bit values. It is equivalent to, but much faster than:

def unpack(x,n):
r = []
m = 2**n
while x:
    x, temp = divmod(x,m)
    r.append(temp)
return r

Some documentation is available via help(gmpy2.unpack) but better documentaion is on my to-do list.

The reason that unpack() can be used to eliminate the % operation is the same as the adding the digits of base-10 number to check for divisibility for 9. In this case, unpack() creates p-bit numbers and we are dividing by 2**p - 1.

Here is some test code:

import gmpy2
import time

def mersenne1(p):
    '''Primality test for Mersenne prime: 2**p -1.
    Uses native Python longs. Does not verify that p is prime.'''

    s = 4
    M = 2**p - 1
    for i in range(p-2):
        s = ((s*s)-2) % M
    return False if s else True

def mersenne2(p):
    '''Primality test for Mersenne prime: 2**p -1.
    Uses gmpy2.mpz. Does not verify that p is prime.'''

    s = gmpy2.mpz(4)
    M = gmpy2.mpz(2)**p - 1
    for i in range(p-2):
        s = ((s*s)-2) % M
    return False if s else True

def mersenne3(p):
    '''Primality test for Mersenne prime: 2**p -1.
    Uses gmpy2.mpz and no mod. Does not verify that p is prime.'''

    s = gmpy2.mpz(4)
    M = gmpy2.mpz(2)**p - 1
    for i in range(p-2):
        s = (s*s)
        s = sum(gmpy2.unpack(s, p))
        s = sum(gmpy2.unpack(s, p))
        if s < 2:
            s = M - 2 + s
        else:
            s = s - 2
    return False if s else True

if __name__ == "__main__":
    p = 44497

    start = time.time()
    result1 = mersenne1(p)
    print("Elapsed time: {:6.3f}".format(time.time() - start))

    start = time.time()
    result2 = mersenne2(p)
    print("Elapsed time: {:6.3f}".format(time.time() - start))

    start = time.time()
    result3 = mersenne3(p)
    print("Elapsed time: {:6.3f}".format(time.time() - start))

    if result1 == result2 == result3:
        print("All three tests are equal!")
    else:
        print("Oops, something has gone wrong.")

And some running times...

C:\x64\Python32>python.exe mersenne.py
Elapsed time: 163.683
Elapsed time: 12.782
Elapsed time:  3.630
All three tests are equal!

Thanks for the reply. Good call on the 64-bit versions of python and gmpy. That seems to have fixed the problem. Also, I cannot use `pow(s, 2, M)`, I think, because my actual line of code is `s = ((s * s) - 2) % M`. I just left it out for simplicity of the code example. — Ryan Peschel, Oct 04 '11 at 00:00
@RyanPeschel It looks like you looking for Mersenne primes. If so, there should be significant performance improvement by replacing the % with some additions. If M=2**n-1, then x % M can be calculated by splitting x into n-bit chunks and adding the chunks. You will need to repeat split/add twice, but it will be faster for large n. The command would be `x=sum(gmpy2.unpack(x, n))`. Experiment with that hint and I'll can post an example later. — casevh, Oct 04 '11 at 00:42
@casevh: Yes I am! I would also greatly appreciate an example on how to further optimize it. I tried looking in the documentation for unpack but I cannot seem to find anything on it.. Also it's giving me an error saying it expects ints when I'm using xmpz's. — Ryan Peschel, Oct 04 '11 at 02:13
Ahh, apparently if I use mpz's instead of xmpz's for unpack it runs. Still, I'm not exactly sure what summing unpack does as I cannot seem to find any documentation on them. — Ryan Peschel, Oct 04 '11 at 02:27
@user2284570 There is bug in GMP on 64-bit Windows systems. GMP assumes `long` will be 64-bit but it is only a 32-bit type on Windows. I am testing a change to GMP that should fix the issue. `gmpy2.mpz()` will convert a string to an `mpz` type. — casevh, Mar 28 '16 at 23:45
@casevh : remember cygwin integers size aren’t necessarily the one used by msys/windows. *(cygwin could be implemented as subsystem but they done it over the windows one)* — user2284570, Mar 28 '16 at 23:49
@user2284570 My comments are based on msys2 (which is based on cygwin) and the common mingw-w64 compiler suite. I am not aware of any 64-bit Windows C compiler that treats `long` as a 64-bit type and produces standard DLLs and applications. — casevh, Mar 28 '16 at 23:56
@casevh : cygwin as evolved in what could be windows subsystem since the msys fork. — user2284570, Mar 29 '16 at 00:16
@user2284570 I've learned something today. The cygwin64 model does use use `long` as a 64-bit type. It probably doesn't help with making an extension module for the commonly used Windows Python interpreters, but it is nice to know. Thanks. — casevh, Mar 29 '16 at 00:36

Trying to calculate large numbers in Python with gmpy. Python keeps crashing?

1 Answers1