Questions tagged [rapids]

RAPIDS is a framework for accelerated machine learning and data science on GPUs

Questions pertaining to RAPIDS. From https://rapids.ai/ :

The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

RAPIDS also focuses on common data preparation tasks for analytics and data science. This includes a familiar DataFrame API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs. RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes.

195 questions
0
votes
1 answer

hdbscan error when inside rapids container

I am using rapids UMAP in conjunction with HDBSCAN inside a rapidsai docker container : rapidsai/rapidsai-core:0.18-cuda11.0-runtime-ubuntu18.04-py3.7 import cudf import cupy from cuml.manifold import UMAP import hdbscan from sklearn.datasets…
Igna
  • 1,078
  • 8
  • 18
0
votes
1 answer

How to load cudf in colab?

I want to use cudf library in my projects and installed the rapids. It has been installed well, and here are some strings from outprint Starting to prep Colab for install RAPIDS Version 0.14 stable Checking for GPU…
Nourless
  • 729
  • 1
  • 5
  • 18
0
votes
1 answer

Getting Java output when running pyspark

I have a problem with Java failing sometimes when running PySpark in Jupyter Notebook on Ubuntu. What I want is to see the error from the Java side because all I can see is usually very long general error of Python that can be summarized with…
Tomasz
  • 658
  • 1
  • 7
  • 22
0
votes
1 answer

How to compile C++ inside RapidsAI Docker Container

When inside the RapidsAI docker image with examples, how does one recompile the C++ code after modifying? I've tried running the build scripts from a terminal sessions inside Jupyter but it cannot find CMake.
George
  • 170
  • 2
  • 13
0
votes
1 answer

type error in functions to run point in polygon query on RAPIDS

I want to create a point in polygon query for 14million NYC taxi trips and find out which of the 263 taxi zones the trips were located. I want to the code on RAPIDS cuspatial. I read a few forums and posts, and came across cuspatial polygon…
byc
  • 121
  • 10
0
votes
0 answers

RAPIDS out of memory when merging cuda dataframe and distance calculations

I'm trying out RAPIDS cudf and cuspatial, wonder what are the better ways cross join two dataframes that result in 27billion rows? I've got two datasets - one from New York City taxi trip data (14.7million rows) containing longitude/latitude of pick…
byc
  • 121
  • 10
0
votes
1 answer

How to convert a NetworkX graph into cuGraph?

So I load a dot file graph using NetworkX. I want to perform operations on GPU on top of it in cuGraph. How to convert NetworkX graph into cuGraph?
DuckQueen
  • 772
  • 10
  • 62
  • 134
0
votes
1 answer

Memory allocation error on worker 0: std::bad_alloc: CUDA error

DESCRIPTION I am just trying to gave a trainign and a test set for the model but I get the following errors 1st data package - train_data = xgboost.DMatrix(data=X_train, label=y_train) Up until I run just this and do training and anything with,…
sogu
  • 2,738
  • 5
  • 31
  • 90
0
votes
1 answer

dask_cudf - not respecting rmm quota, crushes

I am new to machine learning and using GPU - for that reason I was excited about RAPIDs and dask. I am running on an AWS EC2 p3.8xlarge. On it I am running docker with the RAPIDs container. I am using version 0.16. There is an EBS with 60GB. I have…
Tomer Cagan
  • 1,078
  • 17
  • 31
0
votes
1 answer

Unable import "cuxfilter" package in Kaggle Notebook environment

I am working with a > 5GB CSV file for competition in Kaggle. I am using cudf and cuml for data preprocessing and machine learning. but for visualization, my plan was to use GPU accelerated visualization using Plotly. Since Kaggle docker doesn't…
Aravind P
  • 1
  • 2
0
votes
1 answer

Spark Rapids: Simple HashAggregate Example

[Hi All, I am new to Spark Rapids. I was going through the basic introduction to Spark Rapids, where I got a figure (attached) explaining the difference between CPU and GPU based query plans for hashaggregate example. All things in the plans, except…
Jatin
  • 1
0
votes
1 answer

How to install the latest version of rapids, without specifying the version number

I would like to install the latest version of rapids without specifying the version number. From here: https://rapids.ai/start.html conda install -c rapidsai -c nvidia -c conda-forge -c defaults rapids=0.15 python=3.7 cudatoolkit=10.1 which works…
vgoklani
  • 10,685
  • 16
  • 63
  • 101
0
votes
2 answers

MemoryError: std::bad_alloc: rapids.ai Dask-cuDF

I would like to load 5.9 GB CSV and I don't use pandas library. I have 4 GPUs. I use rapids.ai to load this large dataset faster but every time that I tried, this error is shown to me although I have space in my other GPU memory. memory usage of…
Omid Erfanmanesh
  • 547
  • 1
  • 7
  • 29
0
votes
1 answer

Does RAPIDS cuML library support Generalized Linear Models (GLMs)?

I think cuML does support GLMs because linear and logistic regressions are types of GLMs and cuML supports those. Best to be sure! https://github.com/rapidsai/cuml If it is not supported, is it on the roadmap?
0
votes
1 answer

persistent pip install in rapids.ai docker container

This is probably a really stupid question, but one has got to start somewhere. I am playing with NVDIA's rapids.ai gpu-enhanced docker container, but this (presumably by design) does not come with pytorch. Now, of course, I can do a pip install…
Igor Rivin
  • 4,632
  • 2
  • 23
  • 35
1 2 3
12
13