Questions tagged [cudf]

Use this tag for questions specifically related to the cuDF Library, or cuDF DataFrame manipulations.

From PyPI: The RAPIDS cuDF library is a GPU DataFrame manipulation library based on Apache Arrow that accelerates loading, filtering, and manipulation of data for model training data preparation. The RAPIDS GPU DataFrame provides a pandas-like API that will be familiar to data scientists, so they can now build GPU-accelerated workflows more easily.

146 questions

votes

1 answer

Fill Forward cupy / cudf

should be possible execute a fill forward with cupy/cudf? the idea is execute a schimitt trigger function, something like: # pandas version df = some_random_vector on_off = (df>.3)*1 + (df<.3)*-1 on_off[on_off==0) = np.nan on_off =…

cupy cudf

asked Oct 04 '20 at 17:35

user2559936

votes

1 answer

Cudf only using single gpu to load data

I have a large file that I want to load using cudf.read_csv(). The file in question is too large to fit in a single gpu's memory, but still small enough to fit into cpu memory. I can load the file by pd.read_csv(), but it takes forever! In…

python pandas nvidia cudf

asked Sep 18 '20 at 17:14

Ottpocket

votes

2 answers

MemoryError: std::bad_alloc: rapids.ai Dask-cuDF

I would like to load 5.9 GB CSV and I don't use pandas library. I have 4 GPUs. I use rapids.ai to load this large dataset faster but every time that I tried, this error is shown to me although I have space in my other GPU memory. memory usage of…

python pandas dask rapids cudf

asked Aug 26 '20 at 13:03

Omid Erfanmanesh

votes

1 answer

Semantic versioning in dask repository

Why didn't the the commit 7138f470f0e55f2ebdb7638ddc4dfe2e78671403 trigger a new major version of dask since the function read_metadata is incompatible with older versions? The commit introduced the return of 4 values, but the old version only…

python dependencies dask cudf

asked Aug 17 '20 at 17:05

JulianWgs

votes

2 answers

Most efficient way of multi groupby count activities on large datasets

I am trying to find subsets (of any lengths) of attribute (column) values, which are unique in a given dataset. The most efficient way to the best of my knowledge to find those is by computing multiple (many) groupby activities counting the…

pandas group-by pandas-groupby cudf

asked Aug 08 '20 at 15:34

Reacher234

votes

1 answer

I am trying to install cudf from source for conda, I cannot use cmake to install it

I am trying to install CUDF from its source file as given in the page (https://github.com/rapidsai/cudf/blob/branch-0.15/CONTRIBUTING.md#setting-up-your-build-environment ) After the following few steps, I cannot complete it by doing the…

c++ installation cmake anaconda cudf

asked Jul 24 '20 at 07:59

Krishnaap

votes

1 answer

cuDF - Not leveraging GPU cores

I am the below piece of code in python with cuDF to speed up the process. But I do not see any difference in the speed when compared to my 4 core local machine cpu. GPU configuration is 4 x NVIDIA Tesla T4 def arima(train): h = [] for each…

python pandas gpu cudf

asked Apr 21 '20 at 14:20

Jack Daniel

2,527
3
31
52

votes

1 answer

Buffer size must be divisible by element size while converting pandas dataframe to cudf dataframe

I have a data frame with a column as comma-separated-values encoded with quotes ie., string object. Ex: df['a'] '1,2,3,4,5' '2,3,4,5,6' I am able to convert the string formatted list of values to a NumPy array and able to do my operation…

python pandas numpy cudf

asked Apr 21 '20 at 12:31

Jack Daniel

2,527
3
31
52

votes

1 answer

Is there a faster/optimisable way to find unique combinations from a set/list of unique elements in python

I am trying to find all the possible unique combinations out of n elements, taken m at a time. I have used itertools.combinations for the same and I have n=85. So when I'm finding combinations for m=5, the number of combinations produced are about 3…

python parallel-processing multiprocessing combinations cudf

asked Apr 10 '20 at 05:05

Sakshi Tantak

votes

1 answer

cuDF - groupby UDF to support datetime

I have a cuDF dataframe with following columns: columns = ["col1", "col2", "dt"] The (dt) in the form of datetime64[ns]. I would like to write a UDF to apply to each group in this dataframe, and get max of dt for each group. Here is what I am…

numba numba-pro cudf

asked Apr 08 '20 at 20:27

khan

7,005
15
48
70

votes

1 answer

How to do a matrix dot product between two DataFrame in the GPU with rapids.ai

I'm using CUDF it's part of the rapids ML suite from Nvidia. Using this suite how would I do a dot product between two DataFrame? a = cudf.DataFrame([[0.1, 0.2, 0.3, 0.4], [0.1, 0.2, 0.3, 0.4]]) b = cudf.DataFrame([[0.1, 0.2], [0.1,…

gpu rapids cudf

asked Mar 30 '20 at 07:38

MrJasonLi

votes

0 answers

Implement df.groupby('user')['item'].apply(np.array) in cuDF

Is there any way to replicate this simple pandas functionality to cuDF? Note that array lengths are varying. An example of the expected output using pandas and NumPy(CuPy in the cuDF case) be found below: import pandas as pd import numpy as np df =…

python pandas cudf

asked Mar 28 '20 at 17:07

Ioannis Kavadakis

votes

1 answer

Exception using CuDF apply_chunks - Use of unsupported NumPy function 'numpy.ones_like' or unsupported use of the function

I am trying to use numpy from within jit optimized code of Numba but I am getting errors when I am trying to do standard numpy operations like numpy.ones_like, even though numba documentation mentions that the operation is supported. Documentation…

python numpy rapids cudf

asked Feb 12 '20 at 17:50

Strider

votes

1 answer

Lower end GPU vs mid end CPU for data processing

i currently have data simple data processing that involving groupby, merge, and parallel column to column operation. The not so simple part are the massive row used (its detailed cost/financial data). its 300-400 gb in size. due to limited RAM,…

python pandas dask cudf

asked Jan 10 '20 at 04:06

Ditto

votes

2 answers

Read a large csv as a Pandas DataFrame faster

I have a csv that I am reading into a Pandas DataFrame but it takes about 35 minutes to read. The csv is approximately 120 GB. I found a module called cudf that allows a GPU DataFrame however it is only for Linux. Is there something similar for…

python windows pandas dataframe cudf

asked Nov 12 '19 at 23:58

rzaratx

Prev 1 2 3

…

10 Next