Questions tagged [rapids]

RAPIDS is a framework for accelerated machine learning and data science on GPUs

Questions pertaining to RAPIDS. From https://rapids.ai/ :

The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

RAPIDS also focuses on common data preparation tasks for analytics and data science. This includes a familiar DataFrame API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs. RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes.

195 questions
0
votes
1 answer

Installing the cuda rapids + xgboost stack through conda

I'm trying to install install the RAPIDS stack with CUDA through conda in a jupyter notebook inside an AWS Sagemaker Studio instance: conda install -y -c conda-forge -c rapidsai-nightly -c nvidia libgcc cudf cuml xgboost rapids-blazing It tried to…
alvas
  • 115,346
  • 109
  • 446
  • 738
0
votes
1 answer

RAPIDS, CUML on google colab

I'm installing RAPIDS on google colab pro but it takes a lot of time, last 2 instalations took over an hour instead of about 15 minutes as said during instalation "Starting the RAPIDS install on Colab. This will take about 15 minutes". Is there any…
Gozdi
  • 41
  • 1
  • 1
  • 6
0
votes
1 answer

Install RAPIDS on WSL2 Ubuntu 20.04 distribution in Windows 11

I followed the updated instructions, available here, to install RAPIDS on WSL2 Windows 11. As indicated in the instructions, I have not installed CUDA on the Ubuntu distribution. I copied the following command from the official website: …
Angel
  • 193
  • 1
  • 1
  • 4
0
votes
2 answers

Spark RAPIDS does not load (unsupported file format error for CSV and no error for parquet )

I am using a Ubuntu 20.04.4 server with 2xNVidia A100 GPUs. Spark (3.3.0) works fine normally, but when I try to use GPUs through RAPIDS, it just keeps waiting without loading data. I tried loading data as CSV and parquet files, but it fails. The…
Quiescent
  • 1,088
  • 7
  • 18
0
votes
2 answers

How to groupby with custom function in python cuDF?

I am new to using GPU for data manipulations, and have been struggling to replicate some of the functions in cuDF. For instance, I want to get a mode value for each group in the dataset. In Pandas it is easily done with custom functions: df =…
Dark Hobbit
  • 3
  • 1
  • 3
0
votes
1 answer

cuML Random Forest- Segmentation fault (core dumped) when I save the trained model

I am trying to save my Rf model after training it and I get a "Segmentation fault (core dumped)". I have tried to save it before training and it does not give me any problem with pickle. I have tried with other cuML algorithms, and it has let me…
0
votes
0 answers

Rapids on colab

I have always used following commands to install Rapids on Colab (from https://colab.research.google.com/drive/1rY7Ln6rEE1pOlfSHCYOVaqt8OvDO35J0#forceEdit=true&offline=true&sandboxMode=true) !git clone…
paka
  • 55
  • 7
0
votes
1 answer

Running out of memory in Dask cuDF

I've been trying to solve memory management issues in dask_cudf in my recent project for quite some time recently, but it seems I'm missing something and I need your help. I am working on Tesla T4 GPU with 15 GiB memory. I have several ETL steps but…
Milos
  • 1
  • 1
0
votes
1 answer

TypingError in rapids cudf User Defined Function

I have a cudf df with Close and Date columns, where Close is float64 and Date is (%Y-%m-%d) datetime64. I wanted to define a function that takes those columns as inputs and creates what is known as Market Profile, as Data is granular, in same Date…
jack
  • 13
  • 3
0
votes
1 answer

Handle "std::bad_alloc: out_of_memory: CUDA error" at Dask-cudf

I have a pc with a Nvida 3090 and 32GB ram. I am loading a 9GB csv dataset, with millions of rows and 5 columns. Anytime I run compute() it doesn't work and throws std::bad_alloc: out_of_memory: CUDA error. How can I handle this data in my pc? To…
jack
  • 13
  • 3
0
votes
0 answers

'sub' operator not supported Dask_cudf

I came here due a question that surged while I'm following the tutorial's methodology https://docs.rapids.ai/api/cudf/nightly/user_guide/10min.html. I have a dataframe imported as csv with the following structure: x_tick.head() LocalTime Ask…
jack
  • 13
  • 3
0
votes
0 answers

How to read Protobuf files with Dask?

Has anyone tried reading Protobuf files over Dask? Each Protobuf file I have, has multiple records, and each record is prefixed with the length of the record (4 bytes) as shown in the snippet. This is what the current code to read/parse these files…
0
votes
1 answer

Apply ta_py function to Cudf dataframe - RAPIDS

trying to create a new column on a cudf dataframe based on VWMA from ta_py : #creating df CJ_m30 = cudf.read_csv("/media/f333a/Data/CJ_m30.csv", names = ["DateTime","Bid","Ask","Open", "High", "Low", "Close"]) #trying to…
zack
  • 1
  • 2
0
votes
1 answer

Spark on Rapids single node

I'm trying to run Tpcds on Rapids single node on EMR using this guide: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-rapids.html But getting results that worst than CPU. That make me think that maybe I'm not doing it right or maybe…
etiel
  • 43
  • 5
0
votes
2 answers

RAPIDS.ai dependencies cuml and cudf not found no matter how I install

I have followed every version of the instructions on the AWS-EC2 setup for RAPIDS.ai: https://rapids.ai/cloud#AWS-EC2 I can confirm that I am using the exact instance type in the instructions, and following the steps exactly. When I try to use the…
stephenlcurtis
  • 180
  • 1
  • 7