Questions tagged [rapids]

RAPIDS is a framework for accelerated machine learning and data science on GPUs

Questions pertaining to RAPIDS. From https://rapids.ai/ :

The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

RAPIDS also focuses on common data preparation tasks for analytics and data science. This includes a familiar DataFrame API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs. RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes.

195 questions
0
votes
1 answer

Convert cuml (RAPIDS) truncatedSVD into sklearn

I have to convert a code written using cuml (RAPIDS) into sklearn. I found out that in cuml.truncatedSVD the parameter n_components which is the output dimensions (number of singular values) can equal to the number of inputs/features in cuml, but…
FiReTiTi
  • 5,597
  • 12
  • 30
  • 58
0
votes
1 answer

Gaps in nvvp timeline when running rapids with spark

I'm running some sql query against a CSV, generated with tpch-dbgen. I am running it with one thread/task for simplicity, and see the gaps in the timeline as shown in the attached image. Is it disk operations? can this overhead be somehow relaxed or…
0
votes
1 answer

Out of memory error with Dask and cudf loop

I am using Dask and Rapidsai to run an xgboost model on a large (6.9GB) dataset. The hardware is 4x 2080 TIs with 11 GB of memory each. The raw dataset has a few dozen target columns that have been one-hot encoded, so I am trying to run a loop that…
datahappy
  • 826
  • 2
  • 11
  • 29
0
votes
1 answer

cuML RandomForestClassifier: CUDA error with documentation example

I am trying to run in Jupyter notebook the example found here and copied below from the rapids cuML introduction on classification - it runs well with n_samples under 6000 (this parameter dictates the number of rows of the generated dataset) import…
Oleg
  • 161
  • 1
  • 14
0
votes
1 answer

cuGraph on Multi-GPU

Recently, I am reading the code of cuGraph. I notice that it is mentioned that Louvain and Katz algorithms support multi-GPU. However, when I read the C++ code of Louvain, I cannot find code that is related to multi-GPU. Specifically, according to a…
Sevaro
  • 47
  • 4
0
votes
1 answer

Error while running compare models in Pycaret 2.2 on Rapids 0.19 environment(CONDA)

I am facing this issue for RandomForestRegressor while comparing models.My Pycaret version is Pycaret2.2 and it is running in Rapids-0.19 Environment.enter image description here.
samael247
  • 11
  • 2
0
votes
2 answers

TypeError: melt() takes 1 positional argument but 2 were given

I am trying to use melt() function but it is showing me an error for passing 2 argument, which really weird because i am passing id as an argument and in my DataFrame i have only one id column, Although this error only comes when i use data which…
Sudhanshu
  • 704
  • 1
  • 9
  • 24
0
votes
1 answer

AttributeError: 'cupy.core.core.ndarray' object has no attribute 'iloc'

i am trying to split data into training and validation data, for this i am using train_test_split from cuml.preprocessing.model_selection module. but got an…
Sudhanshu
  • 704
  • 1
  • 9
  • 24
0
votes
1 answer

CUML fit functions throwing cp.full TypeError

I've been trying to run RAPIDS on Google Colab pro, and have successfully installed the cuml and cudf packages, however I am unable to run even the example scripts. TLDR; Anytime I try to run the fit function for cuml on Google Colab I get the…
Glen Moutrie
  • 295
  • 3
  • 9
0
votes
1 answer

Unable to load and compute dask_cudf dataframe into blazing table and seeing some memory related errors. (cudaErrorMemoryAllocation out of memory)

Issue : Trying to load a file (CSV and Parquet) using Dask CUDF and seeing some memory related errors. The dataset can easily fit into memory and the file can be read correctly using BlazingSQL's read_parquet method. However the…
0
votes
0 answers

CUML: Random Forest Model Can't Be Trained on a Multi GPU Dask Cluster

Based on the official distributed model training example (https://github.com/rapidsai/cuml/blob/branch-0.18/notebooks/random_forest_mnmg_demo.ipynb), I used the Iris dataset to train a random forest model on a multi GPU dask cluster (one scheduler…
nomad
  • 1
  • 1
0
votes
1 answer

RAPIDS: How to use one dataframe in a UDF called with apply_rows of another dataframe?

For each row in dataframe A, I need to query DF B. I need to do something like this: filter B rows by values in column b1 (B.b1) which are in a range defined by columns A.a1 and A.a2 and assign combined values to column A.a3. In pandas that would be…
Peter
  • 3
  • 2
0
votes
1 answer

cuDF: an alternative of Pandas Groupby + Shift?

I have a DF that I want to use Groupby + Shift. I can do this in pandas, but I cannot do it in cuDF because it is not implemented yet: see the issue Issue #7183. The feature request was long ago, so it seems like they will not implement this in the…
Minh-Long Luu
  • 2,393
  • 1
  • 17
  • 39
0
votes
1 answer

How to rotate X-axis labels in bokeh figure in Cuxfilter?

I have the exact same issue as this question, except the implementation within cuxfilter (RAPIDS) cux_df = cuxfilter.DataFrame.from_dataframe(test) chart0 = cuxfilter.charts.bar('index', 'count') chart0.xaxis.major_label_orientation =…
lys
  • 949
  • 2
  • 9
  • 33
0
votes
1 answer

Dask-Rapids data movment and out of memory issue

I am using dask (2021.3.0) and rapids(0.18) in my project. In this, I am performing preprocessing task on the CPU, and later the preprocessed data is transferred to GPU for K-means clustering. But in this process, I am getting the following…
Vivek kala
  • 23
  • 3