Questions tagged [cudf]

Use this tag for questions specifically related to the cuDF Library, or cuDF DataFrame manipulations.

From PyPI: The RAPIDS cuDF library is a GPU DataFrame manipulation library based on Apache Arrow that accelerates loading, filtering, and manipulation of data for model training data preparation. The RAPIDS GPU DataFrame provides a pandas-like API that will be familiar to data scientists, so they can now build GPU-accelerated workflows more easily.

146 questions
2
votes
1 answer

how to use rapids in colab easily

When i use rapids (cudf) in colaboratory, I execute commands as follows.But this commands usually takes about 20 minutes,so I have to wait everytime to use…
nish
  • 49
  • 1
  • 2
2
votes
2 answers

How to read a large file as Pandas dataframe?

I want to read a large file (4GB) as a Pandas dataframe. Since using Dask directly still consumes maximum CPU, I read the file as a pandas dataframe, then use dask_cudf, and then convert back to a pandas dataframe. However, my code is still using…
melolilili
  • 199
  • 1
  • 3
  • 11
2
votes
1 answer

Rapids.ai / difference of computation with log between Pandas and cudf

Here are my code for comparison between cudf and pandas performance : gpuDF2 = cudf.DataFrame({'col_1': np.arange(0, 10_000_000), 'col_2': np.arange(0, 10_000_000)}) pandasDF2= pd.DataFrame({'col_1':np.arange(0,10_000_000),…
fransua
  • 501
  • 2
  • 18
2
votes
0 answers

Using cudf to apply a function on strings

I am trying to implement cudf and just learned that using .apply is not supported with strings. I am currently trying to apply this function def ngrams(tweet): test = [x for x in re.sub(r'[^\w\s]','', tweet).split() if not x in…
2
votes
2 answers

Install cudf without conda

On google colab I installed conda and then cudf through conda. However now i need to reinstall all packages like sklearn etc which I am using in my code. Is there some way to install cudf without conda ? pip no more works with cudf. Also if there is…
laser
  • 87
  • 1
  • 8
2
votes
1 answer

How to convert vertex-predecessor dataframe to path?

I am using cuGraph to do calculate the shortest path of a graph but instead of returning the shortest path to a particular vertex, it creates a distance-vertex-predecessor table: distance vertex predecessor 3935 0.000000 0 …
Tom McLean
  • 5,583
  • 1
  • 11
  • 36
2
votes
0 answers

cuDF for string comparison boosting

I am working on finding matches between 2 large csv files. I use this function to compute the similarity between 2 strings. If the given ratio is greater than a predefine threshold, then I will accept this as a match. def similar(a, b): return…
Wang Hao
  • 43
  • 3
2
votes
2 answers

GPU processing - cuDF install problem (O/S or hardware issue?)

My aim to to explore GPU acceleration for tabular data with 10,000 to 10M+ records. I am most familiar with Pandas, so cuDF seems like a good place to start. I'm finding mixed results re: whether cuDF will run on my system (Windows 7 Pro 64-bit,…
CreekGeek
  • 1,809
  • 2
  • 14
  • 24
2
votes
1 answer

Warning with CUDF/Python: "User Warning: No NVIDIA GPU detected"

I am having some difficulty running code with the cudf and dask_cudf modules in python. I am working on Jupyter Labs through Anaconda. I have been able to correctly install my nvidia-gpu driver, cudf (through rapidsai), and cuda. Only, when I go to…
Maggie
  • 23
  • 5
2
votes
1 answer

Need Help In Converting cuDF Dataframe to cupy ndarray

I want to convert a cuDF dataframe to cupy ndarray. I'm using this code below: import time import numpy as np import cupy as cp import cudf from numba import cuda df = cudf.read_csv('titanic.csv') arr_cupy =…
Md Kaish Ansari
  • 251
  • 2
  • 7
2
votes
1 answer

Does cudf support get_dummies?

Does cudf support the pandas get_dummies. In pandas I can do the following; >>> s = pd.Series(list('abca')) >>> pd.get_dummies(s) a b c 0 1 0 0 1 0 1 0 2 0 0 1 3 1 0 0
quasiben
  • 1,444
  • 1
  • 11
  • 19
2
votes
1 answer

How to pre-cache dask.dataframe to all workers and partitions to reduce communication need

It’s sometimes appealing to use dask.dataframe.map_partitions for operations like merges. In some scenarios, when doing merges between a left_df and a right_df using map_partitions, I’d like to essentially pre-cache right_df before executing the…
Nick Becker
  • 4,059
  • 13
  • 19
2
votes
1 answer

How much overhead is there per partition when loading dask_cudf partitions into GPU memory?

PCIE bus bandwidth latencies force constraints on how and when applications should copy data to and from GPUs. When working with cuDF directly, I can efficiently move a single large chunk of data into a single DataFrame. When using dask_cudf to…
Randy Gelhausen
  • 125
  • 1
  • 5
1
vote
1 answer

How to parallel GPU processing of Dask dataframe

I would like to use dask to parallelize the data processing for dask cudf from Jupyter notebook on multiple GPUs. import cudf from dask.distributed import Client, wait, get_worker, get_client from dask_cuda import LocalCUDACluster import…
mtnt
  • 31
  • 5
1
vote
1 answer

NVidia Rapids filter neither works nor raises warn/errors

I am using Rapids 23.04 and trying to select reading from parquet/orc files based on select columns and rows. However, strangely the row filter is not working and I am unable to find the cause. Any help would be greatly appreciated. A proof of…
Quiescent
  • 1,088
  • 7
  • 18
1
2
3
9 10