0

could you help me with this Python CuML error

UnownedMemory requires explicit device ID for a null pointer.

I am doing CuML Random Forest cross validation. At each step in the for-loop, I compile all the pandas dataframes except one, train a random forest on this combined dataset, and check the error on the left-out dataset. After doing this on the first three datasets, I get the error above. Not sure what I am doing wrong in terms of memory management. Each dataset is quite big, 2 GB. There are 29 datasets. But the GPU is able to do three rounds of training with 28 datasets without any issues. I have python 3.9 and Cupy 10.6.0.

import cudf
import cuml
import cupy
import pandas as pd
import numpy as np
import os
import time



def cross_validation(dfs):
    try:
        frac = 0.4
        print('Cross validation on', len(dfs), 'datasets.')
        names = dfs.keys() #dfs is a dictionary of dataset names : datasets

        #Iterate over each dataframe, to keep it out.
        for nam in names:
            #Keep out the dataset corresponding to nam.
            print('Leaving',nam,'out.')
            start = time.time()
            mdfs = dfs.copy()
            mdfs.pop(nam)

            #Have the odd dataset ready for testing.
            df = dfs[nam]

            #Compile, train, and test.
            train = pd.DataFrame()
            for mdf in mdfs: 
                train = pd.concat([train, mdfs[mdf].sample(frac = frac)], axis =0, ignore_index = True)
            print('Training now.')

            y_train = train['ActualError']

            train = train.drop(columns=['ActualError'])

            regr = cuml.RandomForestRegressor(n_estimators=20, min_samples_leaf=200)
            regr.fit(train, y_train)
            test_features = df
            prediction = regr.predict(test_features)
            error = np.mean(np.abs(test['copdem'] - prediction -test['bench']))
            print('survived', cupy.get_default_memory_pool().used_bytes(), cupy.get_default_memory_pool().total_bytes())
        return 0
    except Exception as e:
        print(e)
        return -1

The used bytes and total bytes are always 0. I have tried deleting the Random Forest, freeing all bytes in the default memory pool, different memory allocators, but nothing has helped.

0 Answers0