1

I am trying to calculate SHAP Values from a previously trained Random Forest. I am getting the following error:

MemoryError: std::bad_alloc: CUDA error at: /opt/anaconda3/envs/rapids-21.12/include/rmm/mr/device/cuda_memory_resource.hpp

The Code I am using is

import pickle
from cuml.explainer import KernelExplainer
import cupy as cp

filename = 'cuml_random_forest_model.sav'
cuml_model = pickle.load(open(filename, 'rb'))
arr_cupy_X_test = cp.load("arr_cupy_X_test.npy")

cu_explainer = KernelExplainer(model=cuml_model.predict,
                               data=arr_cupy_X_test.astype(cp.float32),
                               is_gpu_model=True)
cu_shap_values = cu_explainer.shap_values(arr_cupy_X_test)

I am using gpu_usage() and torch.cuda.empty_cache() to clear gpu memory. I have diminished the size of the test array arr_cupy_X_test down to 100, but still receiving the error.

Is there maybe another issue with the cuml kernel explainer?

Any suggestions welcome.

Reproducable code example (works with n_samples=2000, throws error with 10000):

from cuml import RandomForestRegressor
from cuml import make_regression
from cuml import train_test_split
from cuml.explainer import KernelExplainer

X, y = make_regression(n_samples=10000,n_features=180,noise=0.1,random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=2,random_state=42)
model = RandomForestRegressor().fit(X_train, y_train)
cu_explainer = KernelExplainer(model=model.predict, data=X_train, is_gpu_model=True)
cu_shap_values = cu_explainer.shap_values(X_test)
  • Could you provide a minimal, reproducible example? With existing information, it's challenging to figure out what might be the reason for a generic memory allocation failure. – Nick Becker Feb 23 '22 at 14:58
  • @NickBecker, The following example leads to the same error, if i turn the number of n_samples up. For 2000 it works fine, 10000 e.g. throws the error. I added the example code in the original post. – user17974383 Feb 24 '22 at 07:49
  • Thanks, this is very helpful. On my machine, this code spikes GPU memory up to about 20GB and then throws an illegal memory access. I've filed https://github.com/rapidsai/cuml/issues/4604 to track this issue – Nick Becker Feb 24 '22 at 15:22

0 Answers0