std::bad_alloc: out_of_memory: CUDA error when importing data/running models

Question

I'm trying to upload a dataset to a NVIDA RAPIDS jupyter notebook, but this error keeps popping up when importing this dataset or when using XGBoost on a dask dataframe. The training dataset is 3.7gb in size. I only have one GPU.

Some specs:

CPU: i7 9700F @4.00GHz
GPU: 3070 8GB GDDR6
RAM: 16GB @3600MHz
Windows 11
Ubuntu 18.04.05 (running the rapids environment)
Rapids version 22.12
CUDA version 12.0
NVIDIA-SMI version 528.02

I tried using this: https://www.kaggle.com/getting-started/140636 but I think this issue goes deeper

import cudf
import dask_cudf
import dask_xgboost
import xgboost as xgb
import tensorflow as tf
import torch

!du -sh one-hot-train.csv
> 3.7G  one-hot-train.csv

!du -sh y-train.csv
> 10M   y-train.csv

# Does not work due to memory issue
X_train = cudf.read_csv('one-hot-train.csv', index_col = 0)

# This will import the data no problem
X_train = dask_cudf.read_csv('one-hot-train.csv', chunksize = "4GB")
X_train = X_train.drop(columns = ['Unnamed: 0'])

# Since the y csv is so small, it doesn't matter how it's imported
y_train = dask_cudf.read_csv('y-train.csv')
y_train = y_train.drop(columns = ['Unnamed: 0'])

xgb_params = {
    
    'learning_rate': 0.3,
    'objective': 'binary:logistic',
    'tree_method': 'gpu_hist',
    'max_depth': 6,
    'seed': 555,
    'predictor': 'gpu_predictor',
    'eval_metric': 'aucpr',
    'n_estimators': 5000,
    
}

# Does not work due to memory issue
xgb_model = dask_xgb.XGBClassifier(**xgb_params)
xgb_model.fit(X_train, y_train)

Here's the specific error:

> MemoryError: std::bad_alloc: out_of_memory: CUDA error at: ~/miniconda3/envs/rapids-22.12/include/rmm/mr/device/cuda_memory_resource.hpp

Can you incrementally train the model. Instead of pass all the data at once, I would read data in chunks and train model incrementally. — Prayson W. Daniel, Jan 24 '23 at 05:25

score 2 · Answer 1 · answered Jan 24 '23 at 14:36

Using Dask XGBoost does not help if you're using a single GPU, as the entire data still needs to fit in memory to train the model. You should either use Dask XGBoost with multiple GPUs or use a single, larger GPU to train this model. XGBoost provides an experimental external memory interface for larger-than-memory dataset training, but it's not ready for production use.

Separately, it looks like you're one-hot-encoding your data based on the file name. You don't need to one-hot-encode your data with recent versions of XGBoost. See the XGBoost Categorical documentation for more information.

std::bad_alloc: out_of_memory: CUDA error when importing data/running models

1 Answers1