Questions tagged [cudf]

Use this tag for questions specifically related to the cuDF Library, or cuDF DataFrame manipulations.

From PyPI: The RAPIDS cuDF library is a GPU DataFrame manipulation library based on Apache Arrow that accelerates loading, filtering, and manipulation of data for model training data preparation. The RAPIDS GPU DataFrame provides a pandas-like API that will be familiar to data scientists, so they can now build GPU-accelerated workflows more easily.

146 questions
1
vote
1 answer

Google Colab: cannot install cudf

I need help. I am using Google Colab with Python 3.10.11 and I have a Colab with CUDA Version: 12.0 , Nvidia driver version 525.85.12 and I am following this tutorial on how to install cuDF …
Nata107
  • 31
  • 5
1
vote
0 answers

error of converting pandas data frame to cudf data frame

I would like to convert a pandas data frame to a cudf data frame on linux. My code: import cudf import pandas as pd test_data = { 'session_id':[1, 2], 'val' : [1.1, 2.2] } pd_df = pd.DataFrame(test_data) …
mtnt
  • 31
  • 5
1
vote
0 answers

dask_cudf/dask read_parquet failed with NotImplementedError: large_string

I am a new user of dask/dask_cudf. I have a parquet files of various sizes (11GB, 2.5GB, 1.1GB), all of which failed with NotImplementedError: large_string. My dask.dataframe backend is cudf. When the backend is pandas, read.parquet works…
stucash
  • 1,078
  • 1
  • 12
  • 23
1
vote
0 answers

dask_cudf dataframe convert column of datetime string to column of datetime object

I am a new user of Dask and RapidsAI. An exerpt of my data (in csv format): Symbol,Date,Open,High,Low,Close,Volume AADR,17-Oct-2017 09:00,57.47,58.3844,57.3645,58.3844,2094 AADR,17-Oct-2017 10:00,57.27,57.2856,57.25,57.27,627 AADR,17-Oct-2017…
stucash
  • 1,078
  • 1
  • 12
  • 23
1
vote
0 answers

How to run query with lists and sets in cuDF

I am using cudf (dask-cudf) to handle tens~billions of data for social media. I'm trying to use query in extracting only the relevant users from the mother data set. However, unlike pandas, cudf's query will error if I pass in a list or set. The…
felntc
  • 13
  • 2
1
vote
1 answer

'Cannot convert value of type NotImplementedType to cudf scalar' appearing on trivial sort_values example in cudf 22.08, python 3.9

Apologies - I'm aware there's a similar question, however I'm new to SO, so I'm unable to comment underneath the answer. I'm having issues with sort_values in a vanilla install of cudf as per the RAPIDs website: conda create -n rapids-22.08 -c…
dcgt1
  • 33
  • 3
1
vote
1 answer

CUDF not reading columns properly

I'm trying to read a csv with cudf. It work nicely but when I try to get the content of the columns, it seems that cudf is not recognizing them at all. It's a very odd behavior : Here is the code : And here is the error : any help please? thanks
1
vote
0 answers

Using CuPy/cuDF, remove elements that are not distant enough to their previous elements from a sorted list

The purpose of the code is similar to this post I have a code that runs on CPUs: import pandas as pd def remove(s: pd.Series, thres:int): pivot = -float("inf") new_s = [] for e in s: if (e-pivot)>thres: …
dacapo1142
  • 59
  • 5
1
vote
1 answer

Dask-cuDF to CuDF dataframe conversion

Is there any function, that convert Dask-cudf dataframe to Cudf dataframe?Like from_cudf for cudf to dask-cudf. dgdf = dask_cudf.from_cudf(df, npartitions=2)
1
vote
1 answer

Extracting specific rows from a multi-indexed Pandas Dataframe to form new DataFrame

I have a data set that I am loading onto a Pandas dataframe that is a Jagged 3-D array called: Waveform. The dataframe is multi-indexed by three levels: Events (Entry), Photons (Subentry) generated by each event, data points (subsubentry) per…
1
vote
1 answer

Cannot create 3rd lagged columns with dask-cudf

I have the following dask_cudf.core.DataFrame:- import pandas as pd import numpy as np import dask_cudf import cudf data = {"x":range(1,21), "nor":np.random.normal(2, 4, 20), "unif":np.random.uniform(size = 20)} df = cudf.DataFrame(data) ddf =…
Shawn Brar
  • 1,346
  • 3
  • 17
1
vote
1 answer

TypeError: First element of field tuple is neither a tuple nor str, with cuDF.DataFrame.apply(func,axis)

I am trying to apply histogram row-wise using the apply function but getting an error. Below code is the implementation def f(row): return np.histogram(row, bins=5,range=(1,10)) import torch import cudf as df torch.manual_seed(1) bins =…
ammar naich
  • 73
  • 1
  • 4
1
vote
0 answers

RAPIDS cuml KNeighbors: number of landmark samples must be >= k

Minimum reproducible example: import cudf from cuml.neighbors import KNeighborsRegressor d = { 'id':['a','b','c','d','e','f'], 'latitude':[50,-22,13,37,43,14], 'longitude':[3,-43,100,27,-4,121], } df = cudf.DataFrame(d) knn =…
pjmathematician
  • 125
  • 1
  • 5
1
vote
2 answers

Join values from a DataFrame according to an array of indices

I have a DataFrame test with shape (1138812, 57). The head looks like this: And I have an array indices which has a shape (1138812, 25). It is a 2D array with each subarray having 25 indices. It looks like this: [ the indices array has 25 indices…
pjmathematician
  • 125
  • 1
  • 5
1
vote
2 answers

cudf instllation issue on centos7

I'm new to rapids ai libraries. I've an existing conda environment yaml file where I'm using python 3.8.5, tensorflow 2.7.0, opencv-python-headless 4.5.5.62, numpy 1.22.2, pandas 1.4.1, pandas-profiling 3.1.0, seaborn 0.11.2, matplotlib 3.5.1,…
soumeng78
  • 600
  • 7
  • 12
1 2
3
9 10