I have a Python-based machine learning code that runs three algorithms on my data: Random Forest (implementation in scikit-learn), Gradient Boosting (implementation in XGBoost), and Recurrent Neural Network (implementation in Theano/Keras). The first two are running on the CPU and are parallelized using joblib.Parallel()
and the latter runs on the GPU using CUDA. I pass the name of the algorithm I want to use (RF
, XGB
, NN
) in the variable method
to a function, and it will proceed with running a parallel implementation of each algorithm.
from joblib import Parallel, delayed
if method in ['RF', 'XGB']:
with Parallel(n_jobs = 8) as parallel_param:
...
model_scores = parallel_param(delayed(model_selection_loop)(p_idx, ..., method, ...) for p_idx in range(num_parameter_samples))
...
elif method == 'NN':
...
model_scores = []
for p_idx in range(num_parameter_samples):
model_scores.append(model_selection_loop(p_idx, ..., method, ...))
...
else:
raise ValueError("Unknown algorithm!")
model_selection_loop()
is a function that performs nested cross-validation to select the best performing hyper-parameters and to estimate the performance of the selected model on new data, and num_parameter_samples
is the number of different hyper-parameter configurations to be sampled from a grid (I use, say, 30 different configurations).
If the run order is (1) RF
, (2) XGB
, (3) NN
, everything runs fine; the first two are parallelized on a 4-core CPU and the latter runs on a Tesla K80 GPU; at the end, I get three predictions which I can later merge. However, after running the Theano-based, CUDA-parallelized neural network, if I rerun either RF
or XGB
, I will get the following error:
--> 965 model_scores = parallel_param(delayed(model_selection_loop)(p_idx, ..., method, ...) for p_idx in range(num_parameter_samples))
/home/s/anaconda2/lib/python2.7/site-packages/joblib/parallel.pyc in __call__(self, iterable)
808 # consumption.
809 self._iterating = False
--> 810 self.retrieve()
811 # Make sure that we get a last message telling us we are done
812 elapsed_time = time.time() - self._start_time
/home/s/anaconda2/lib/python2.7/site-packages/joblib/parallel.pyc in retrieve(self)
725 job = self._jobs.pop(0)
726 try:
--> 727 self._output.extend(job.get())
728 except tuple(self.exceptions) as exception:
729 # Stop dispatching any new job in the async callback thread
/home/s/anaconda2/lib/python2.7/multiprocessing/pool.pyc in get(self, timeout)
565 return self._value
566 else:
--> 567 raise self._value
568
569 def _set(self, i, obj):
GpuArrayException: invalid argument
It seems to me that once the CUDA-based code (recurrent neural network) is executed, any call to Parallel()
will get rerouted to the GPU instead of the CPU, since the error I get is GpuArrayException
even though I had called a scikit-learn Random Forest classifier which should run on the CPU. In fact, if my first run is ordered like (1) NN
, (2) RF
, (3) XGB
, I get the same error after finishing the first pass.
I should mention that I am running these in an IPython session, so I am typing the commands by hand in the terminal (although I am not sure if that matters or not).
Any ideas on why this happens and how I can route any execution of RF
or XGB
to the CPU regardless of whether a NN
method was previously executed on the GPU or not? I'd appreciate your help.