Questions tagged [cudf]

Use this tag for questions specifically related to the cuDF Library, or cuDF DataFrame manipulations.

From PyPI: The RAPIDS cuDF library is a GPU DataFrame manipulation library based on Apache Arrow that accelerates loading, filtering, and manipulation of data for model training data preparation. The RAPIDS GPU DataFrame provides a pandas-like API that will be familiar to data scientists, so they can now build GPU-accelerated workflows more easily.

146 questions

votes

1 answer

how to use tqdm progress bar in dask_cudf and cudf

I can use tqdm progress bar in pandas for example: tqdm.pandas() df = df['var'].progress_apply(lambda x: something(x)) can i do same in thing cudf or dask_cudf if not then how can i use tqdm progress bar in it,

asked Jul 31 '21 at 16:21

user14954599

votes

1 answer

searching index with cudf dataframe doesn't work with numpy

I just loaded the csv file with cudf (rapidsai) to reduce the time it takes. An issue comes up when I try to search index with an condition where df['X'] = A. here is my code example: import cudf, io, requests df = cudf.read_csv('fileA.csv') # X…

pandas numpy cudf

asked Jul 08 '21 at 02:02

Brian Lee

votes

1 answer

Gaps in nvvp timeline when running rapids with spark

I'm running some sql query against a CSV, generated with tpch-dbgen. I am running it with one thread/task for simplicity, and see the gaps in the timeline as shown in the attached image. Is it disk operations? can this overhead be somehow relaxed or…

rapids cudf

asked Jun 20 '21 at 19:25

Eyal Hirsch

votes

1 answer

Out of memory error with Dask and cudf loop

I am using Dask and Rapidsai to run an xgboost model on a large (6.9GB) dataset. The hardware is 4x 2080 TIs with 11 GB of memory each. The raw dataset has a few dozen target columns that have been one-hot encoded, so I am trying to run a loop that…

python dask rapids cudf dask-ml

asked Jun 08 '21 at 16:01

datahappy

votes

1 answer

AttributeError: 'cupy.core.core.ndarray' object has no attribute 'iloc'

i am trying to split data into training and validation data, for this i am using train_test_split from cuml.preprocessing.model_selection module. but got an…

python machine-learning rapids cudf

asked May 03 '21 at 14:03

Sudhanshu

votes

1 answer

RAPIDS: How to use one dataframe in a UDF called with apply_rows of another dataframe?

For each row in dataframe A, I need to query DF B. I need to do something like this: filter B rows by values in column b1 (B.b1) which are in a range defined by columns A.a1 and A.a2 and assign combined values to column A.a3. In pandas that would be…

python pandas rapids cudf

asked Apr 19 '21 at 03:04

Peter

votes

1 answer

cuDF: an alternative of Pandas Groupby + Shift?

I have a DF that I want to use Groupby + Shift. I can do this in pandas, but I cannot do it in cuDF because it is not implemented yet: see the issue Issue #7183. The feature request was long ago, so it seems like they will not implement this in the…

pandas rapids cudf

asked Mar 30 '21 at 02:23

Minh-Long Luu

2,393
1
17
39

votes

1 answer

hdbscan error when inside rapids container

I am using rapids UMAP in conjunction with HDBSCAN inside a rapidsai docker container : rapidsai/rapidsai-core:0.18-cuda11.0-runtime-ubuntu18.04-py3.7 import cudf import cupy from cuml.manifold import UMAP import hdbscan from sklearn.datasets…

cupy rapids cudf hdbscan

asked Mar 12 '21 at 21:38

Igna

1,078
8
18

votes

2 answers

TypeError: data must be list or dict-like in CUDF

I am implementing CUDF to speed up my python process. Firstly, I import CUDF and removed multiprocessing code, and initialize variables with CUDF. After changing into CUDF it gives a dictionary error. How I can remove these loops to make effective…

python pandas numpy cudf

asked Feb 06 '21 at 04:10

Khawar Islam

2,556
2
34
56

votes

1 answer

from numba import cuda, numpy_support and ImportError: cannot import name 'numpy_support' from 'numba'

I am changing pandas into cudf to make faster aggregating and reduce the processing speed. I figure out one library which works on GPU with pandas. "CUDF LINK" https://github.com/rapidsai/cudf When I entered the below to install in my project it…

pandas numpy numba cudf

asked Feb 04 '21 at 12:56

Khawar Islam

2,556
2
34
56

votes

1 answer

cuDF low GPU utilization

I have a task that involves running many queries on a dataframe. I compared the performance of running these queries on a Xeon CPU (Pandas) vs. RTX 2080 (CUDF). For a dataframe of 100k rows, GPU is faster but not by much. Looking at nvidia-smi…

cudf

asked Dec 28 '20 at 20:50

Yuriy S

votes

0 answers

RAPIDS out of memory when merging cuda dataframe and distance calculations

I'm trying out RAPIDS cudf and cuspatial, wonder what are the better ways cross join two dataframes that result in 27billion rows? I've got two datasets - one from New York City taxi trip data (14.7million rows) containing longitude/latitude of pick…

memory gpu batch-processing rapids cudf

asked Dec 27 '20 at 09:41

byc

votes

1 answer

Pandas DF - Cut time b/w 2 timestamps into hour bins

Say I have data of this format in a df id sta end dur 40433 2020-01-08 05:06:01 2020-01-08 05:08:14 133 40433 2020-09-22 12:01:26 2020-09-22 12:31:34 1808 40433 2020-09-22 12:05:00 2020-09-22…

python pandas numpy datetime cudf

asked Dec 15 '20 at 11:19

oompaloompa

votes

1 answer

Python modified groupby ngroup in cuDF with list comprehension

I am trying to write a function that does something similar to pandas's groupby().ngroups() function. The difference is that I want each subgroup count to restart at 0. So given the following data: | EVENT_1 | EVENT_2 | | ------- | ------- | | …

python data-science numba cudf

asked Dec 02 '20 at 20:37

Kyle

votes

1 answer

Memory allocation error on worker 0: std::bad_alloc: CUDA error

DESCRIPTION I am just trying to gave a trainign and a test set for the model but I get the following errors 1st data package - train_data = xgboost.DMatrix(data=X_train, label=y_train) Up until I run just this and do training and anything with,…

python python-3.x cuda rapids cudf

asked Nov 17 '20 at 16:12

sogu

2,738
5
31
90

Prev 1 2 3

…

9 10 Next