3

I have a code which reads data from multiple files named 001.txt, 002.txt, ... , 411.txt. I would like to read the data from each file, plot them, and save as 001.jpg, 002.jpg, ... , 411.jpg.

I can do this by looping through the files, but I would like to use the multiprocess module to speed things up.
However, when I use the code below, the computer hangs- I can't click on anything, but the mouse moves, and the sound continues. I then have to power down the computer.

I'm obviously misusing the multiprocess module with matplotlib. I have used something very similar to the below code to actually generate the data, and save to text files with no problems. What am I missing?

    import multiprocessing

    def do_plot(number):
        fig = figure(number)

        a, b = random.sample(range(1,9999),1000), random.sample(range(1,9999),1000)
        # generate random data
        scatter(a, b)

        savefig("%03d" % (number,) + ".jpg")
        print "Done ", number
        close()



    for i in (0, 1, 2, 3):
        jobs = []
    #    for j in chunk:
        p = multiprocessing.Process(target = do_plot, args = (i,))
        jobs.append(p)
        p.start()
    p.join()

1 Answers1

2

The most important thing in using multiprocessing is to run the main code of the module only for the main process. This can be achieved by testing if __name__ == '__main__' as shown below:

import matplotlib.pyplot as plt
import numpy.random as random
from multiprocessing import Pool


def do_plot(number):
    fig = plt.figure(number)

    a = random.sample(1000)
    b = random.sample(1000)

    # generate random data
    plt.scatter(a, b)

    plt.savefig("%03d.jpg" % (number,))
    plt.close()

    print("Done ", number)


if __name__ == '__main__':
    pool = Pool()
    pool.map(do_plot, range(4))

Note also that I replaced the creation of the separate processes by a process pool (which scales better to many pictures since it only uses as many process as you have cores available).

David Zwicker
  • 23,581
  • 6
  • 62
  • 77
  • Using your suggestion gives the following error: In [2]: % run -i test.py Exception in thread Thread-3: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 505, in run self.__target(*self.__args, **self.__kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 342, in _handle_tasks put(task) PicklingError: Can't pickle : attribute lookup __builtin__.function failed – phys_geo_person Jul 21 '14 at 13:54
  • Also, I am using python 2.7. I see why you used pool now. Before, I had been breaking the data into "chunks" of 4, one for each core. Thanks! – phys_geo_person Jul 21 '14 at 13:57
  • The error is weird, since it should not be forced to pickle anything apart from `number` as far as I can see. I tested it with python2.7 and it works like a charm. In fact, this code also works with python3.4 on my machine. – David Zwicker Jul 21 '14 at 14:34
  • I just tried your code on another computer, and it works just fine. I tried similar codes on the problem computer, and they all reach the same error... I am going to accept your answer (as it worked), but would you have any idea what could be going wrong? Should I reinstall matplotlib, or is it something more sinister at fault? The MPI for example? – phys_geo_person Jul 21 '14 at 14:43
  • Ok, so instead of using ipython, I called the script from vanilla python, and it worked, so the error is somehow Ipython related. Thanks! – phys_geo_person Jul 21 '14 at 14:49
  • Do you call your `ipyhton` with the `--pylab` flag? This could pose problems since the interactive mode of matplotlib is then automatically enables, if I'm not mistaken. – David Zwicker Jul 21 '14 at 14:55
  • Without the --pylab flag, it works fine. It also works if you call the function from an external file, and the run the main separately. It seems that this is not a new problem: http://thread.gmane.org/gmane.comp.python.ipython.user/4052 It seems to only happen when you use ipython + --pylab. Thanks very much for the help! – phys_geo_person Jul 21 '14 at 15:28