Questions tagged [rapids]

RAPIDS is a framework for accelerated machine learning and data science on GPUs

Questions pertaining to RAPIDS. From https://rapids.ai/ :

The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

RAPIDS also focuses on common data preparation tasks for analytics and data science. This includes a familiar DataFrame API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs. RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes.

195 questions
2
votes
2 answers

Is there a way to run rapids time series modules (ARIMA, ESM) on multi-gpu?

I have a node with 2 Tesla P100 gpu's on it. When i run rapids.tsa.ARIMA (or ESM), it will only utilise one of the GPU's. Is there a way to utilise multi-gpu's for training the models? Like as in rapids-dask-xgboost ?
kitz
  • 63
  • 4
2
votes
1 answer

How can I use xgboost.dask with gpu to model a very large dataset in both a distributed and batched manner?

I would like to utilise multiple GPUs spread across many nodes to train an XGBoost model on a very large data set within Azure Machine Learning using 3 NC12s_v3 compute nodes. The dataset size exceeds both VRAM and RAM size when persisted into Dask,…
HowdyEarth
  • 63
  • 8
2
votes
1 answer

RAPIDS in Colab AttributeError: module 'cudf' has no attribute '_lib'

I already install RAPIDS in Colab with no issues until I tried to import cuml library. I have fortunaly the Tesla 4 as GPU. This is how I installed RAPIDS # clone RAPIDS AI rapidsai-csp-utils scripts repo >> !git clone…
2
votes
1 answer

Need Help In Converting cuDF Dataframe to cupy ndarray

I want to convert a cuDF dataframe to cupy ndarray. I'm using this code below: import time import numpy as np import cupy as cp import cudf from numba import cuda df = cudf.read_csv('titanic.csv') arr_cupy =…
Md Kaish Ansari
  • 251
  • 2
  • 7
2
votes
2 answers

Why RandomForestClassifier on CPU (using SKLearn) and on GPU (using RAPIDs) get differents scores, very different?

I am using RandomForestClassifier on CPU with SKLearn and on GPU using RAPIDs. I am doing a benchmark between these two libraries about speed up and scoring using Iris dataset (it is a try, in the future, I will change the dataset for a better…
JuMoGar
  • 1,740
  • 2
  • 19
  • 46
2
votes
1 answer

How to pre-cache dask.dataframe to all workers and partitions to reduce communication need

It’s sometimes appealing to use dask.dataframe.map_partitions for operations like merges. In some scenarios, when doing merges between a left_df and a right_df using map_partitions, I’d like to essentially pre-cache right_df before executing the…
Nick Becker
  • 4,059
  • 13
  • 19
2
votes
1 answer

Why am I getting different results from Scikit-learn API vs Learning API of XGBoost?

I used the Scikit-learn API for XGBoost (in python). My accuracy was ~ 68%. I used the same parameter set and used the Learning API for XGBoost; my accuracy was ~ 60%. My understanding is that Scikit-learn API is a wrapper around Learning API and…
2
votes
0 answers

Options for accelerating Python code through parallelizing/ multiprocessing

Below, I've gathered 4 ways to complete the execution of code that involves sorting updating Pandas Dataframes. I would like to apply the best methods to speed up the code execution. Am I using the best available practices? Would someone please…
Kdog
  • 503
  • 5
  • 20
2
votes
1 answer

How much overhead is there per partition when loading dask_cudf partitions into GPU memory?

PCIE bus bandwidth latencies force constraints on how and when applications should copy data to and from GPUs. When working with cuDF directly, I can efficiently move a single large chunk of data into a single DataFrame. When using dask_cudf to…
Randy Gelhausen
  • 125
  • 1
  • 5
2
votes
4 answers

Install RAPIDS library on Googe Colab notebook

I was wondering if I could install RAPIDS library (executing machine learning tasks entirely on GPU) in Google Colaboratory notebook? I've done some research but I've not been able to find the way to do that...
Novak
  • 2,143
  • 1
  • 12
  • 22
1
vote
1 answer

How to parallel GPU processing of Dask dataframe

I would like to use dask to parallelize the data processing for dask cudf from Jupyter notebook on multiple GPUs. import cudf from dask.distributed import Client, wait, get_worker, get_client from dask_cuda import LocalCUDACluster import…
mtnt
  • 31
  • 5
1
vote
1 answer

NVidia Rapids filter neither works nor raises warn/errors

I am using Rapids 23.04 and trying to select reading from parquet/orc files based on select columns and rows. However, strangely the row filter is not working and I am unable to find the cause. Any help would be greatly appreciated. A proof of…
Quiescent
  • 1,088
  • 7
  • 18
1
vote
1 answer

Google Colab: cannot install cudf

I need help. I am using Google Colab with Python 3.10.11 and I have a Colab with CUDA Version: 12.0 , Nvidia driver version 525.85.12 and I am following this tutorial on how to install cuDF …
Nata107
  • 31
  • 5
1
vote
0 answers

dask_cudf/dask read_parquet failed with NotImplementedError: large_string

I am a new user of dask/dask_cudf. I have a parquet files of various sizes (11GB, 2.5GB, 1.1GB), all of which failed with NotImplementedError: large_string. My dask.dataframe backend is cudf. When the backend is pandas, read.parquet works…
stucash
  • 1,078
  • 1
  • 12
  • 23
1
vote
0 answers

dask_cudf dataframe convert column of datetime string to column of datetime object

I am a new user of Dask and RapidsAI. An exerpt of my data (in csv format): Symbol,Date,Open,High,Low,Close,Volume AADR,17-Oct-2017 09:00,57.47,58.3844,57.3645,58.3844,2094 AADR,17-Oct-2017 10:00,57.27,57.2856,57.25,57.27,627 AADR,17-Oct-2017…
stucash
  • 1,078
  • 1
  • 12
  • 23
1 2
3
12 13