0

This is my code in Google Colab:

import cupy as cp
import numpy as np
import joblib
import dask_ml.model_selection as dcv

def ParamSelection(X, Y, nfolds):
    param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100],'kernel':['linear'], 'gamma':[0.001, 0.01, 0.1, 1, 10, 100]}
    svc = svm.SVC()
    grid_search = dcv.GridSearchCV(svc, param_grid, cv = nfolds)
    grid_search.fit(X, Y)
    print(grid_search.best_params_)
    print(grid_search.best_estimator_)
    print(grid_search.best_score_)
    return grid_search.best_estimator_

svc = ParamSelection(X_train.astype(cp.int_), y_train.astype(cp.int_), 10) 

I have this error

TypeError                                 Traceback (most recent call last)
<ipython-input-163-56196d6a31bd> in <module>()
     15     return grid_search.best_estimator_
     16 
---> 17 svc = ParamSelection(X_train.astype(cp.int_), y_train.astype(cp.int_), 10)
     18 

9 frames
/usr/local/lib/python3.7/site-packages/cudf/core/frame.py in __array__(self, dtype)
   1677     def __array__(self, dtype=None):
   1678         raise TypeError(
-> 1679             "Implicit conversion to a host NumPy array via __array__ is not "
   1680             "allowed, To explicitly construct a GPU array, consider using "
   1681             "cupy.asarray(...)\nTo explicitly construct a "

TypeError: Implicit conversion to a host NumPy array via __array__ is not allowed, To explicitly construct a GPU array, consider using cupy.asarray(...)
To explicitly construct a host array, consider using .to_array()

For train_test_split I use function from : from dask_ml.model_selection import train_test_split I don't really know, where is problem.

Any suggestions?

1 Answers1

1

Somewhere in the internals, Dask ML is likely calling np.asarray on a cupy array. This method of implicitly causing a CPU to GPU transfer is generally not permitted, so an error is thrown.

If you instead use CPU based data with a cuML estimator, this should work as expected.

import cupy as cp
import dask_ml.model_selection as dcv
from sklearn.datasets import make_classification
from cuml import svm
​
X, y = make_classification(
    n_samples=100
)
​
def ParamSelection(X, Y, nfolds):
    param_grid = {'C': [0.001, 10, 100],'gamma':[0.001, 100]}
    svc = svm.SVC()
    grid_search = dcv.GridSearchCV(svc, param_grid, cv = nfolds)
    grid_search.fit(X, Y)
    print(grid_search.best_params_)
    print(grid_search.best_estimator_)
    print(grid_search.best_score_)
    return grid_search.best_estimator_
​
svc = ParamSelection(X, y, 2) 
{'C': 10, 'gamma': 0.001}
SVC()
0.8399999737739563
Nick Becker
  • 4,059
  • 13
  • 19