In order to use the R package DiceKriging from Python with multiprocessing, I wrote something like the following example.
Multiprocessing allows for a significant gain in time, but has the unexpected side effect of quickly increasing memory consumption.
When setting flag_pool = False
in the following code (hence not using multiprocessing) the memory is stable.
I tried using both Python and R garbage collectors (as suggested here, here, and there) without success.
How could this be avoided?
MWE:
from multiprocessing import Pool
import numpy as np
from rpy2 import robjects
robjects.r("library('DiceKriging')")
# Generate data
def model_to_emulate(X):
return X[:, 0]**3 - X[:, 1]**2
X = np.random.random_sample((1000, 2))
z = model_to_emulate(X)
list_arg = [[z, X]] * N_gp
# Emulation
flag_pool = True
N_gp = 3
N_repetition = 3
def worker_km(response_design):
response = response_design[0]
robjects.globalenv["response"] = robjects.FloatVector(response)
design = response_design[1]
df = robjects.r["data.frame"]([robjects.FloatVector(column) for column in
design.T])
df.names = ["x%d" % ii for ii in xrange(design.shape[1])]
robjects.globalenv["design"] = df
return robjects.r("fit = km(design=design, response=response,"
"covtype='matern5_2')")
for _ in xrange(N_repetition):
if flag_pool:
print "==================== Using Pool."
pool = Pool(N_gp)
out = pool.map(worker_km, list_arg)
pool.close()
pool.join()
else:
print ">>>>>>>>>>>>>>>>>>>> Not using Pool."
for response_design in list_arg:
out = worker_km(response_design)
#
Edit: I use ubuntu 12.04.4, python 2.7.3, R 2.14.1 and rpy2 2.2.5.
I asked a similar question here with an MWE that should be easier to run.