python3 memoryerror when producing a large list

Question

I'm a beginner. I recently see the Mandelbrot set which is fantastic, so I decide to draw this set with python. But there is a problem,I got 'memoryerror' when I run this code.

This statement num_set = gen_num_set(10000) will produce a large list, about 20000*20000*4 = 1600000000. When I use '1000' instead of '10000', I can run code successfully.

My computer's memory is 4GB and the operating system is window7 32bit. I want to know if this problem is limit of my computer or there is some way to optimize my code.

Thanks.

#!/usr/bin/env python3.4

import matplotlib.pyplot as plt
import numpy as np
import random,time
from multiprocessing import *

def first_quadrant(n):
    start_point = 1 / n
    n = 2*n
    return gen_complex_num(start_point,n,1)        

def second_quadrant(n):
    start_point = 1 / n
    n = 2*n
    return gen_complex_num(start_point,n,2)

def third_quadrant(n):
    start_point = 1 / n
    n = 2*n
    return gen_complex_num(start_point,n,3)

def four_quadrant(n):
    start_point = 1 / n
    n = 2*n
    return gen_complex_num(start_point,n,4)

def gen_complex_num(start_point,n,quadrant):
    complex_num = []
    if quadrant == 1:        
        for i in range(n):
            real = i*start_point
            for j in range(n):
                imag = j*start_point
                complex_num.append(complex(real,imag))
        return complex_num
    elif quadrant == 2:
        for i in range(n):
            real = i*start_point*(-1)
            for j in range(n):
                imag = j*start_point
                complex_num.append(complex(real,imag))
        return complex_num
    elif quadrant == 3:
        for i in range(n):
            real = i*start_point*(-1)
            for j in range(n):
                imag = j*start_point*(-1)
                complex_num.append(complex(real,imag))
        return complex_num
    elif quadrant == 4:
        for i in range(n):
            real = i*start_point
            for j in range(n):
                imag = j*start_point*(-1)
                complex_num.append(complex(real,imag))
        return complex_num            

def gen_num_set(n):
    return [first_quadrant(n), second_quadrant(n), third_quadrant(n), four_quadrant(n)]

def if_man_set(num_set):
    iteration_n = 10000
    man_set = []
    z = complex(0,0)
    for c in num_set:
        if_man = 1
        for i in range(iteration_n):
            if abs(z) > 2:
                if_man = 0
                z = complex(0,0)
                break
            z = z*z + c
        if if_man:          
            man_set.append(c)        
    return man_set


def plot_scatter(x,y):
    #plt.plot(x,y)

    color = ran_color()
    plt.scatter(x,y,c=color)
    plt.show()

def ran_num():
    return random.random()

def ran_color():
    return [ran_num() for i in range(3)]

def plot_man_set(man_set):
    z_real = []
    z_imag = []
    for z in man_set:
        z_real.append(z.real)
        z_imag.append(z.imag)
    plot_scatter(z_real,z_imag)


if __name__ == "__main__":
    start_time = time.time()
    num_set = gen_num_set(10000)    
    with Pool(processes=4) as pool:
        #use multiprocess
        set_part = pool.map(if_man_set, num_set)
    man_set = []
    for i in set_part:
        man_set += i
    plot_man_set(man_set)
    end_time = time.time()
    use_time = end_time - start_time
    print(use_time)

The error is to do with the amount of memory your computer has. — muddyfish, Aug 17 '15 at 08:14
It's possible that after a reboot and running nothing else, you have 1.6 GB memory available. — Weather Vane, Aug 18 '15 at 17:00
How big image do you want to create (in pixels)? Your code is unnecessarily complicated because you want to _optimize first_ by using multiple threads. If you are a beginner I suggest to throw away everything except the Mandelbrot code, and optimize later when it works properly. — karatedog, Sep 12 '15 at 09:45
I want see the detail of the picture, I don't know how big the image will be.I'm trying to optimize Mandelbrot code recently. I can not log in stackoverflow these days >_<. — 轩字语, Sep 29 '15 at 04:41

score 3 · Accepted Answer · answered Aug 17 '15 at 08:15

You say you are creating a list with 1.6 billion elements. Each of those is a complex number which contains 2 floats. A Python complex number takes 24 bytes (at least on my system: sys.getsizeof(complex(1.0,1.0)) gives 24), so you'll need over 38GB just to store the values, and that's before you even start looking at the list itself.

Your list with 1.6 billion elements won't fit at all on a 32-bit system (6.4GB with 4 byte pointers), so you need to go to a 64-bit system with 8 byte pointers and at will need 12.8GB just for the pointers.

So, no way you're going to do that unless you upgrade to a 64-bit OS with maybe 64GB RAM (though it might need more).

Thank Duncan, let me learn a new way to analyse code. I just read 'Think Python: How to Think Like a Computer Scientist', so I need learn more. — 轩字语, Aug 18 '15 at 02:50

m00am · Answer 2 · 2020-02-27T08:06:58.267

When handling large data like this you should prefer using numpy arrays instead of python lists. There is a nice post explaining why (What are the advantages of NumPy over regular Python lists?), but I will try to sum it up.

In Python, each complex number in your list is an object (with methods and attributes) and takes up some overhead space for that. That is why they take up 24 bytes (as Duncan pointed out) instead of the 2 * 32bit for two floats per complex number.

Numpy arrays build on c-style arrays (basically all values written next to each other in memory as raw numbers, not objects). They don't provide some of the nice functionality of python lists (like appending) and are restricted to only one data type. They save a lot of space though, as you do not need to save the objects' overhead. This reduces the space needed for each complex number from 24 bytes to 8 bytes (two floats, 32bit each).

While Duncan is right and the big instance you tried will not run even with numpy, it might help you to process bigger instances.

As you have already imported numpy your could change you code to use numpy arrays instead. Please mind that I am not too proficient with numpy and there most certainly is a better way to do this, but this is an example with only little changes to your original code:

def gen_complex_num_np(start_point, n, quadrant):
    # create a nxn array of complex numbers
    complex_num = np.ndarray(shape=(n,n), dtype=np.complex64) 
    if quadrant == 1:        
        for i in range(n):
            real = i*start_point
            for j in range(n):
                imag = j*start_point
                # fill ony entry in the array
                complex_num[i,j] = complex(real,imag) 
        # concatenate the array rows to 
        # get a list-like return value again
        return complex_num.flatten() 
    ...

Here your Python list is replaced with a 2d-numpy array with the data type complex. After the array has been filled it is flattened (all row vectors are concatenated) to mimic your return format.

Note that you would have to change the man_set lists in all other parts of your program accordingly.

I hope this helps.

Thank m00am, this is helpful. I will try numpy arrays intead of python lists later. I think I will see a beautiful Mandelbrot set with this ^_^. — 轩字语, Aug 18 '15 at 02:59

python3 memoryerror when producing a large list

2 Answers2