Questions tagged [rapids]

RAPIDS is a framework for accelerated machine learning and data science on GPUs

Questions pertaining to RAPIDS. From https://rapids.ai/ :

The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

RAPIDS also focuses on common data preparation tasks for analytics and data science. This includes a familiar DataFrame API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs. RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes.

195 questions
0
votes
1 answer

Multiple Spark Executors on single GPU

We are trying to improve the Spark Job processing performance by introducing GPUs to the nodes. But after enabling Spark3 with GPUs we are seeing downtrend in spark job performance, due to limited number of spark executors creation with GPU…
Manju N
  • 886
  • 9
  • 14
0
votes
0 answers

Feature Selection, Outlier Removal, Target Transformer for Dask-ML pipelines

While FS, OR, TT have well-established components in "classic" scikit-learn pipelines, documentation of dask-ml and RAPIDS totally omits them. What are the best practices to implement Feature Selection, Outlier Removal, Target Transformer in dask-ml…
0
votes
1 answer

RAPIDS pip installation issue

I've been trying to install RAPIDS in my Docker environment, which initially went smoothly. However, over the past one or two weeks, I've been encountering an error. The issue seems to be that pip is attempting to fetch from the default PyPi…
Steven
  • 170
  • 3
  • 10
0
votes
1 answer

NVidia Rapids: Non-Euclidean metric in cuml UMAP

I am trying to use GPU (A100) to perform UMAP for speedup. I am facing problem as Euclidean metric does not seem to work for me at all but correlation/cosine are promising. However, the code I am using below seems to produce only Euclidean metric…
Quiescent
  • 1,088
  • 7
  • 18
0
votes
1 answer

Why can't I install cuML on wsl?

Installing rapids and cuML is not working I have Cuda installed (Cuda compilation tools, release 11.8, V11.8.89) in a Python 3.10.11 env in JN in VSC on wsl2 on a desktop running Windows 11, with the latest nvidia drivers. This is what nvidia-smi…
0
votes
1 answer

Troubleshooting cudf.tokenize(): 'Length Mismatch' error with non-space delimiters

Cudf Tokenize Element Length Mismatch This is the expected result for tokenize(' ') on space character: 0 Due 0 to 0 being 0 on 0 FMLA …
0
votes
0 answers

RAPIDS cuML linear regression running slower than statsmodels.api equivalent?

This is my first time posting on here so my apologies if this is the wrong place to ask or if I'm missing info. Basically I have the following code for a linear regression model using statsmodels and cuml, and I expected the rapids version to be…
Resh
  • 1
  • 1
0
votes
1 answer

Rapidsai (DGA Streamz): ERROR- module dask has no attribute distributed

I have been trying to run the dga detection streamz on the rapidsai clx streamz docker container for the last few days without any resolution.I'm following the instructions on the rapids website:…
Swooz
  • 5
  • 3
0
votes
0 answers

create conda enviroment with cuml and tensorflow-gpu dependencies got error

I try to create conda env with the following environment.yml file name: myenv channels: - rapidsai - conda-forge - nvidia dependencies: - python=3.10 - cudf=23.04 - cuml=23.04 - cugraph=23.04 - cuspatial=23.04 - cuxfilter=23.04 -…
raymond.mh.ng
  • 343
  • 2
  • 3
  • 21
0
votes
1 answer

Install older version of Rapids AI using Docker

how do I install an older version of Rapids AI using Docker such as 22.06? The newest version 23.02 doesn't work on any VAST AI (https://vast.ai/) machine.
0
votes
0 answers

Latest version of RAPIDS cuML in Kaggle notebooks

First of all, I am fairly new to running models on GPU, so sorry in advance for stupid questions. I use RAPIDS cuML to GPU-accelerate some algorithms, but I noticed I cannot use the latest version (23.2.0) in a Kaggle notebook. When importing cuML,…
0
votes
1 answer

Correctly zipping two columns with different data types in cuDF

I have the following DataFrame in cuDF: Context Questions 0 Architecturally, the school has a Catholic cha... [To whom did the Virgin Mary allegedly…
JOKKINATOR
  • 356
  • 1
  • 11
0
votes
1 answer

Using Rapids with kmeans imputation in python

I was just wondering if anyone has been able to successfully use rapids with knn imputation. I know cuml.impute was avaliable perviously but is seems like it has not been removed. If anyone has a suggestion that would be great. I tried using…
0
votes
0 answers

can I use custom tokenizer using tf-idf vectorizer in cuml library?

I have tried to make tf-idf embeddings but my corpus isn't small. the amount I would use is about 300~500k and the max lenght of input I would set is 450. I got to know that I can deal with large sparse matrix by sklearn's HashingVectorizer but I…
Tae-su
  • 1
  • 2
0
votes
0 answers

How to improve GPU utilization for a faster calculation?

I am a new Rapids learner. I installed a Rapids-23.02 framework with 6G GPU, and 32G RAM on ubantu . When I run a program that only uses rapids for acceleration, the GPU memories only used 3G (nvidia-smi), and RAM memory only used 7.5G. I have tried…