Questions tagged [cudf]

Use this tag for questions specifically related to the cuDF Library, or cuDF DataFrame manipulations.

From PyPI: The RAPIDS cuDF library is a GPU DataFrame manipulation library based on Apache Arrow that accelerates loading, filtering, and manipulation of data for model training data preparation. The RAPIDS GPU DataFrame provides a pandas-like API that will be familiar to data scientists, so they can now build GPU-accelerated workflows more easily.

146 questions
0
votes
1 answer

Fill Forward cupy / cudf

should be possible execute a fill forward with cupy/cudf? the idea is execute a schimitt trigger function, something like: # pandas version df = some_random_vector on_off = (df>.3)*1 + (df<.3)*-1 on_off[on_off==0) = np.nan on_off =…
user2559936
  • 97
  • 3
  • 6
0
votes
1 answer

Cudf only using single gpu to load data

I have a large file that I want to load using cudf.read_csv(). The file in question is too large to fit in a single gpu's memory, but still small enough to fit into cpu memory. I can load the file by pd.read_csv(), but it takes forever! In…
Ottpocket
  • 77
  • 12
0
votes
2 answers

MemoryError: std::bad_alloc: rapids.ai Dask-cuDF

I would like to load 5.9 GB CSV and I don't use pandas library. I have 4 GPUs. I use rapids.ai to load this large dataset faster but every time that I tried, this error is shown to me although I have space in my other GPU memory. memory usage of…
Omid Erfanmanesh
  • 547
  • 1
  • 7
  • 29
0
votes
1 answer

Semantic versioning in dask repository

Why didn't the the commit 7138f470f0e55f2ebdb7638ddc4dfe2e78671403 trigger a new major version of dask since the function read_metadata is incompatible with older versions? The commit introduced the return of 4 values, but the old version only…
JulianWgs
  • 961
  • 1
  • 14
  • 25
0
votes
2 answers

Most efficient way of multi groupby count activities on large datasets

I am trying to find subsets (of any lengths) of attribute (column) values, which are unique in a given dataset. The most efficient way to the best of my knowledge to find those is by computing multiple (many) groupby activities counting the…
Reacher234
  • 230
  • 2
  • 11
0
votes
1 answer

I am trying to install cudf from source for conda, I cannot use cmake to install it

I am trying to install CUDF from its source file as given in the page (https://github.com/rapidsai/cudf/blob/branch-0.15/CONTRIBUTING.md#setting-up-your-build-environment ) After the following few steps, I cannot complete it by doing the…
Krishnaap
  • 297
  • 3
  • 18
0
votes
1 answer

cuDF - Not leveraging GPU cores

I am the below piece of code in python with cuDF to speed up the process. But I do not see any difference in the speed when compared to my 4 core local machine cpu. GPU configuration is 4 x NVIDIA Tesla T4 def arima(train): h = [] for each…
Jack Daniel
  • 2,527
  • 3
  • 31
  • 52
0
votes
1 answer

Buffer size must be divisible by element size while converting pandas dataframe to cudf dataframe

I have a data frame with a column as comma-separated-values encoded with quotes ie., string object. Ex: df['a'] '1,2,3,4,5' '2,3,4,5,6' I am able to convert the string formatted list of values to a NumPy array and able to do my operation…
Jack Daniel
  • 2,527
  • 3
  • 31
  • 52
0
votes
1 answer

Is there a faster/optimisable way to find unique combinations from a set/list of unique elements in python

I am trying to find all the possible unique combinations out of n elements, taken m at a time. I have used itertools.combinations for the same and I have n=85. So when I'm finding combinations for m=5, the number of combinations produced are about 3…
0
votes
1 answer

cuDF - groupby UDF to support datetime

I have a cuDF dataframe with following columns: columns = ["col1", "col2", "dt"] The (dt) in the form of datetime64[ns]. I would like to write a UDF to apply to each group in this dataframe, and get max of dt for each group. Here is what I am…
khan
  • 7,005
  • 15
  • 48
  • 70
0
votes
1 answer

How to do a matrix dot product between two DataFrame in the GPU with rapids.ai

I'm using CUDF it's part of the rapids ML suite from Nvidia. Using this suite how would I do a dot product between two DataFrame? a = cudf.DataFrame([[0.1, 0.2, 0.3, 0.4], [0.1, 0.2, 0.3, 0.4]]) b = cudf.DataFrame([[0.1, 0.2], [0.1,…
MrJasonLi
  • 21
  • 3
0
votes
0 answers

Implement df.groupby('user')['item'].apply(np.array) in cuDF

Is there any way to replicate this simple pandas functionality to cuDF? Note that array lengths are varying. An example of the expected output using pandas and NumPy(CuPy in the cuDF case) be found below: import pandas as pd import numpy as np df =…
0
votes
1 answer

Exception using CuDF apply_chunks - Use of unsupported NumPy function 'numpy.ones_like' or unsupported use of the function

I am trying to use numpy from within jit optimized code of Numba but I am getting errors when I am trying to do standard numpy operations like numpy.ones_like, even though numba documentation mentions that the operation is supported. Documentation…
Strider
  • 1
  • 5
0
votes
1 answer

Lower end GPU vs mid end CPU for data processing

i currently have data simple data processing that involving groupby, merge, and parallel column to column operation. The not so simple part are the massive row used (its detailed cost/financial data). its 300-400 gb in size. due to limited RAM,…
Ditto
  • 25
  • 4
0
votes
2 answers

Read a large csv as a Pandas DataFrame faster

I have a csv that I am reading into a Pandas DataFrame but it takes about 35 minutes to read. The csv is approximately 120 GB. I found a module called cudf that allows a GPU DataFrame however it is only for Linux. Is there something similar for…
rzaratx
  • 756
  • 3
  • 9
  • 29
1 2 3
9
10