I try to use multiprocess to plot and save many figures. Using code in Saving multiple matplotlib figures with multiprocessing, I get an error:
NameError: name 'plt' is not defined
It works if I use the following code with library import inside the function and with the multiprocess fork of multiprocessing:
# taken from https://stackoverflow.com/questions/24866070/saving-multiple-matplotlib-figures-with-multiprocessing
from multiprocess import Pool
def do_plot(number):
import numpy.random as random
import matplotlib.pyplot as plt
fig = plt.figure(number)
a = random.sample(1000)
b = random.sample(1000)
# generate random data
plt.scatter(a, b)
plt.savefig("%03d.jpg" % (number,))
plt.close()
print("Done ", number)
pool = Pool(4)
pool.map(do_plot, range(4))
However, my real function depends on a big dataframe defined outside of it:
import pandas as pd
df = pd.DataFrame({'A':[1,2,3],'B':[1,2,3]})
def do_plot(number):
import numpy.random as random
import matplotlib.pyplot as plt
fig = plt.figure(number)
a = df['A']
b = df['B']
# generate random data
plt.scatter(a, b)
plt.savefig("%03d.jpg" % (number,))
plt.close()
print("Done ", number)
pool = Pool(4)
pool.map(do_plot, range(4))
which returns:
NameError: name 'df' is not defined
What should I do to use a dataframe and other objects defined outside the do_plot
function ?