Questions tagged [cudf]

Use this tag for questions specifically related to the cuDF Library, or cuDF DataFrame manipulations.

From PyPI: The RAPIDS cuDF library is a GPU DataFrame manipulation library based on Apache Arrow that accelerates loading, filtering, and manipulation of data for model training data preparation. The RAPIDS GPU DataFrame provides a pandas-like API that will be familiar to data scientists, so they can now build GPU-accelerated workflows more easily.

146 questions

votes

1 answer

how to use rapids in colab easily

When i use rapids (cudf) in colaboratory, I execute commands as follows.But this commands usually takes about 20 minutes,so I have to wait everytime to use…

asked Aug 06 '22 at 00:23

nish

votes

2 answers

How to read a large file as Pandas dataframe?

I want to read a large file (4GB) as a Pandas dataframe. Since using Dask directly still consumes maximum CPU, I read the file as a pandas dataframe, then use dask_cudf, and then convert back to a pandas dataframe. However, my code is still using…

python pandas dask kaggle cudf

asked Jul 31 '22 at 14:43

melolilili

votes

1 answer

Rapids.ai / difference of computation with log between Pandas and cudf

Here are my code for comparison between cudf and pandas performance : gpuDF2 = cudf.DataFrame({'col_1': np.arange(0, 10_000_000), 'col_2': np.arange(0, 10_000_000)}) pandasDF2= pd.DataFrame({'col_1':np.arange(0,10_000_000),…

python pandas cupy rapids cudf

asked Jun 24 '22 at 12:56

fransua

votes

0 answers

Using cudf to apply a function on strings

I am trying to implement cudf and just learned that using .apply is not supported with strings. I am currently trying to apply this function def ngrams(tweet): test = [x for x in re.sub(r'[^\w\s]','', tweet).split() if not x in…

python pandas cudf

asked Mar 10 '22 at 08:55

Dylan Newman

votes

2 answers

Install cudf without conda

On google colab I installed conda and then cudf through conda. However now i need to reinstall all packages like sklearn etc which I am using in my code. Is there some way to install cudf without conda ? pip no more works with cudf. Also if there is…

python pip gpu conda cudf

asked Nov 09 '21 at 15:07

laser

votes

1 answer

How to convert vertex-predecessor dataframe to path?

I am using cuGraph to do calculate the shortest path of a graph but instead of returning the shortest path to a particular vertex, it creates a distance-vertex-predecessor table: distance vertex predecessor 3935 0.000000 0 …

python graph-theory rapids cudf

asked Oct 27 '21 at 15:48

Tom McLean

5,583
1
11
36

votes

0 answers

cuDF for string comparison boosting

I am working on finding matches between 2 large csv files. I use this function to compute the similarity between 2 strings. If the given ratio is greater than a predefine threshold, then I will accept this as a match. def similar(a, b): return…

python cudf

asked Sep 28 '20 at 02:58

Wang Hao

votes

2 answers

GPU processing - cuDF install problem (O/S or hardware issue?)

My aim to to explore GPU acceleration for tabular data with 10,000 to 10M+ records. I am most familiar with Pandas, so cuDF seems like a good place to start. I'm finding mixed results re: whether cuDF will run on my system (Windows 7 Pro 64-bit,…

python python-3.x windows rapids cudf

asked Aug 26 '20 at 21:28

CreekGeek

1,809
2
14
24

votes

1 answer

Warning with CUDF/Python: "User Warning: No NVIDIA GPU detected"

I am having some difficulty running code with the cudf and dask_cudf modules in python. I am working on Jupyter Labs through Anaconda. I have been able to correctly install my nvidia-gpu driver, cudf (through rapidsai), and cuda. Only, when I go to…

python cuda dask rapids cudf

asked Jul 13 '20 at 16:45

Maggie

votes

1 answer

Need Help In Converting cuDF Dataframe to cupy ndarray

I want to convert a cuDF dataframe to cupy ndarray. I'm using this code below: import time import numpy as np import cupy as cp import cudf from numba import cuda df = cudf.read_csv('titanic.csv') arr_cupy =…

python nvidia cupy rapids cudf

asked May 07 '20 at 14:59

Md Kaish Ansari

votes

1 answer

Does cudf support get_dummies?

Does cudf support the pandas get_dummies. In pandas I can do the following; >>> s = pd.Series(list('abca')) >>> pd.get_dummies(s) a b c 0 1 0 0 1 0 1 0 2 0 0 1 3 1 0 0

pandas cudf

asked Nov 12 '19 at 16:35

quasiben

1,444
1
11
19

votes

1 answer

How to pre-cache dask.dataframe to all workers and partitions to reduce communication need

It’s sometimes appealing to use dask.dataframe.map_partitions for operations like merges. In some scenarios, when doing merges between a left_df and a right_df using map_partitions, I’d like to essentially pre-cache right_df before executing the…

python pandas dask rapids cudf

asked Jul 30 '19 at 14:47

Nick Becker

4,059
13
19

votes

1 answer

How much overhead is there per partition when loading dask_cudf partitions into GPU memory?

PCIE bus bandwidth latencies force constraints on how and when applications should copy data to and from GPUs. When working with cuDF directly, I can efficiently move a single large chunk of data into a single DataFrame. When using dask_cudf to…

dask rapids cudf

asked Feb 14 '19 at 18:41

Randy Gelhausen

vote

1 answer

How to parallel GPU processing of Dask dataframe

I would like to use dask to parallelize the data processing for dask cudf from Jupyter notebook on multiple GPUs. import cudf from dask.distributed import Client, wait, get_worker, get_client from dask_cuda import LocalCUDACluster import…

gpu dask dask-distributed rapids cudf

asked Jun 21 '23 at 00:40

mtnt

vote

1 answer

NVidia Rapids filter neither works nor raises warn/errors

I am using Rapids 23.04 and trying to select reading from parquet/orc files based on select columns and rows. However, strangely the row filter is not working and I am unable to find the cause. Any help would be greatly appreciated. A proof of…

python-3.x dask rapids cudf

asked May 17 '23 at 15:22

Quiescent

1,088
7
18

Prev 1

…

9 10 Next