Questions tagged [rapids]

RAPIDS is a framework for accelerated machine learning and data science on GPUs

Questions pertaining to RAPIDS. From https://rapids.ai/ :

The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

RAPIDS also focuses on common data preparation tasks for analytics and data science. This includes a familiar DataFrame API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs. RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes.

195 questions

votes

2 answers

Is there a way to run rapids time series modules (ARIMA, ESM) on multi-gpu?

I have a node with 2 Tesla P100 gpu's on it. When i run rapids.tsa.ARIMA (or ESM), it will only utilise one of the GPU's. Is there a way to utilise multi-gpu's for training the models? Like as in rapids-dask-xgboost ?

gpu rapids

asked Jul 06 '20 at 16:33

kitz

votes

1 answer

How can I use xgboost.dask with gpu to model a very large dataset in both a distributed and batched manner?

I would like to utilise multiple GPUs spread across many nodes to train an XGBoost model on a very large data set within Azure Machine Learning using 3 NC12s_v3 compute nodes. The dataset size exceeds both VRAM and RAM size when persisted into Dask,…

python dask xgboost dask-distributed rapids

asked Jul 02 '20 at 11:28

HowdyEarth

votes

1 answer

RAPIDS in Colab AttributeError: module 'cudf' has no attribute '_lib'

I already install RAPIDS in Colab with no issues until I tried to import cuml library. I have fortunaly the Tesla 4 as GPU. This is how I installed RAPIDS # clone RAPIDS AI rapidsai-csp-utils scripts repo >> !git clone…

python-3.x google-colaboratory rapids

asked Jun 05 '20 at 00:54

Sergio Flores Lanque

votes

1 answer

Need Help In Converting cuDF Dataframe to cupy ndarray

I want to convert a cuDF dataframe to cupy ndarray. I'm using this code below: import time import numpy as np import cupy as cp import cudf from numba import cuda df = cudf.read_csv('titanic.csv') arr_cupy =…

python nvidia cupy rapids cudf

asked May 07 '20 at 14:59

Md Kaish Ansari

votes

2 answers

Why RandomForestClassifier on CPU (using SKLearn) and on GPU (using RAPIDs) get differents scores, very different?

I am using RandomForestClassifier on CPU with SKLearn and on GPU using RAPIDs. I am doing a benchmark between these two libraries about speed up and scoring using Iris dataset (it is a try, in the future, I will change the dataset for a better…

python scikit-learn random-forest rapids

asked Mar 12 '20 at 09:35

JuMoGar

1,740
2
19
46

votes

1 answer

How to pre-cache dask.dataframe to all workers and partitions to reduce communication need

It’s sometimes appealing to use dask.dataframe.map_partitions for operations like merges. In some scenarios, when doing merges between a left_df and a right_df using map_partitions, I’d like to essentially pre-cache right_df before executing the…

python pandas dask rapids cudf

asked Jul 30 '19 at 14:47

Nick Becker

4,059
13
19

votes

1 answer

Why am I getting different results from Scikit-learn API vs Learning API of XGBoost?

I used the Scikit-learn API for XGBoost (in python). My accuracy was ~ 68%. I used the same parameter set and used the Learning API for XGBoost; my accuracy was ~ 60%. My understanding is that Scikit-learn API is a wrapper around Learning API and…

scikit-learn xgboost rapids

asked May 03 '19 at 21:50

Sukrit Mukhopadhyay

votes

0 answers

Options for accelerating Python code through parallelizing/ multiprocessing

Below, I've gathered 4 ways to complete the execution of code that involves sorting updating Pandas Dataframes. I would like to apply the best methods to speed up the code execution. Am I using the best available practices? Would someone please…

python multithreading pandas dask rapids

asked Feb 19 '19 at 21:41

Kdog

votes

1 answer

How much overhead is there per partition when loading dask_cudf partitions into GPU memory?

PCIE bus bandwidth latencies force constraints on how and when applications should copy data to and from GPUs. When working with cuDF directly, I can efficiently move a single large chunk of data into a single DataFrame. When using dask_cudf to…

dask rapids cudf

asked Feb 14 '19 at 18:41

Randy Gelhausen

votes

4 answers

Install RAPIDS library on Googe Colab notebook

I was wondering if I could install RAPIDS library (executing machine learning tasks entirely on GPU) in Google Colaboratory notebook? I've done some research but I've not been able to find the way to do that...

google-colaboratory rapids

asked Dec 17 '18 at 10:19

Novak

2,143
1
12
22

vote

1 answer

How to parallel GPU processing of Dask dataframe

I would like to use dask to parallelize the data processing for dask cudf from Jupyter notebook on multiple GPUs. import cudf from dask.distributed import Client, wait, get_worker, get_client from dask_cuda import LocalCUDACluster import…

gpu dask dask-distributed rapids cudf

asked Jun 21 '23 at 00:40

mtnt

vote

1 answer

NVidia Rapids filter neither works nor raises warn/errors

I am using Rapids 23.04 and trying to select reading from parquet/orc files based on select columns and rows. However, strangely the row filter is not working and I am unable to find the cause. Any help would be greatly appreciated. A proof of…

python-3.x dask rapids cudf

asked May 17 '23 at 15:22

Quiescent

1,088
7
18

vote

1 answer

Google Colab: cannot install cudf

I need help. I am using Google Colab with Python 3.10.11 and I have a Colab with CUDA Version: 12.0 , Nvidia driver version 525.85.12 and I am following this tutorial on how to install cuDF …

python-3.x google-colaboratory rapids cudf

asked May 12 '23 at 09:01

Nata107

vote

0 answers

dask_cudf/dask read_parquet failed with NotImplementedError: large_string

I am a new user of dask/dask_cudf. I have a parquet files of various sizes (11GB, 2.5GB, 1.1GB), all of which failed with NotImplementedError: large_string. My dask.dataframe backend is cudf. When the backend is pandas, read.parquet works…

python dask-dataframe rapids cudf

asked Mar 30 '23 at 16:12

stucash

1,078
1
12
23

vote

0 answers

dask_cudf dataframe convert column of datetime string to column of datetime object

I am a new user of Dask and RapidsAI. An exerpt of my data (in csv format): Symbol,Date,Open,High,Low,Close,Volume AADR,17-Oct-2017 09:00,57.47,58.3844,57.3645,58.3844,2094 AADR,17-Oct-2017 10:00,57.27,57.2856,57.25,57.27,627 AADR,17-Oct-2017…

python dataframe dask rapids cudf

asked Mar 30 '23 at 14:55

stucash

1,078
1
12
23

Prev 1 2

…

12 13 Next