Questions tagged [cudf]

Use this tag for questions specifically related to the cuDF Library, or cuDF DataFrame manipulations.

From PyPI: The RAPIDS cuDF library is a GPU DataFrame manipulation library based on Apache Arrow that accelerates loading, filtering, and manipulation of data for model training data preparation. The RAPIDS GPU DataFrame provides a pandas-like API that will be familiar to data scientists, so they can now build GPU-accelerated workflows more easily.

146 questions
1
vote
1 answer

Not able to install cudf, cupy and cuml into colab with rapids.ai version 21.08

I am trying to install cudf and cuml on google colab pro following this tutorial: rapids_cudf.ipynb - Colaboratory But after running the following block of code: # intall miniconda !wget -c…
user14151235
1
vote
0 answers

Python cuDF cannot use cuDF dataframe function inside UDF

I am trying to use cuDF row_apply to calculate a new column according to other rows. For a single row, it works well with the following script filteredhlcdf.loc[(filteredhlcdf.ddate == %%Ddate%%) & (filteredhlcdf.sstart == %%sTime%%) &…
1
vote
2 answers

cudf read csv file error : total size of strings is too large for cudf column

I use colab and run cudf.read_csv() with a huge csv file (3GB, 17540000 records),but the result is wrong. import cudf import numpy as np import pandas as pd import csv g_df = cudf.read_csv('drive/MyDrive/m1.csv',escapechar="\\") error message…
vicpython
  • 11
  • 1
1
vote
1 answer

TypeError: 'BlockManager' object is not iterable

I'm trying to merge a cudf dataframe and a geopandas dataframe. df = df.merge(parishes[['NAME_3', 'area']], left_on='Parish', right_on='NAME_3').drop(columns=['NAME_3']) df is a cudf dataframe and parishes is a geopandas dataframe. On running the…
AnonymousMe
  • 509
  • 1
  • 5
  • 18
1
vote
2 answers

How to create unique ID column in DASK_CUDF

How to create unique id column in dsak cudf dataframe across all the partitions So far I am using following technique, but if I increase data to more than 10cr rows it is giving me memory error. def unique_id(df): rag = cupy.arrange(len(df)) …
A14
  • 111
  • 11
1
vote
0 answers

Pandas TypeError when using cudf dataframe, but not pandas

I don't think I'm trying to solve this as much as understand what's going on so I can apply it in the context of my larger project. I am working on rewriting a Python package to run on GPU. Anyway, I am using cudf and cuml to pass a dataframe to a…
datahappy
  • 826
  • 2
  • 11
  • 29
1
vote
1 answer

Understanding dask cudf object lifecycle

I want to understand the efficient memory management process for Dask objects. I have setup a Dask GPU cluster and I am able to execute tasks that runs across the cluster. However, with the dask objects, especially when I run the compute function,…
1
vote
2 answers

reading a huge csv file using cudf

I am trying to read a huge csv file CUDF but gets memory issues. import cudf cudf.set_allocator("managed") cudf.__version__ user_wine_rate_df = cudf.read_csv('myfile.csv', sep = "\t", …
Areza
  • 5,623
  • 7
  • 48
  • 79
1
vote
1 answer

Why am I getting an assertion error when create Device Quantile Matrix?

I am using the following code to load a csv file into a dask cudf, and then creating a devicequantilematrix for xgboost which yields the error: cluster = LocalCUDACluster(rmm_pool_size=parse_bytes("9GB"), n_workers=5, threads_per_worker=1) client =…
lara_toff
  • 413
  • 2
  • 14
1
vote
1 answer

How do I install dask_cudf?

I am using the follow lines in terminal to install rapids and then dask cudf: conda create -n rapids-core-0.14 -c rapidsai -c nvidia -c conda-forge \ -c defaults rapids=0.14 python=3.7 cudatoolkit=10.1 conda activate rapids-core-0.14 conda…
lara_toff
  • 413
  • 2
  • 14
1
vote
1 answer

python cuDF groupby apply with ordered data

I have some ordered data where there is a hierarchy of events. Each column is a unique id of an event with respect to the event above it in the hierarchy. Something similar to how each day number is unique in a month, and each month number is unique…
Kyle
  • 461
  • 3
  • 13
1
vote
1 answer

Why is cuml predict() method for KNearestNeighbors taking so long with dask_cudf DataFrame?

I have a large dataset (around 80 million rows) and I am training a KNearestNeighbors Regression model using cuml with a dask_cudf DataFrame. I am using 4 GPU's with an rmm_pool_size of 15GB each: from dask.distributed import Client from dask_cuda…
agp
  • 31
  • 6
1
vote
1 answer

Calculating haversine distances on groups using cudf and cuspatial

I am trying to use accelerated (GPU backed) computing for distance calculations, but have had a lot of trouble with the nuances between pandas and cudf. I have a df with vehicles and points in time (lat,lng,timestamp), my cpu based calculation was…
1
vote
1 answer

ModuleNotFoundError: No module named 'cudf' in google colab

I tried importing cudf and get the following error: ModuleNotFoundError Traceback (most recent call last) in () ----> 1 import cudf; print('cuDF Version:', cudf.__version__) ModuleNotFoundError: No module…
Keerthi Sree
  • 13
  • 1
  • 3
1
vote
2 answers

GPU driver(cuda,cudf etc.)downloaded but it doesn't work

My gpu is gtx 2070. I have followed every steps from https://github.com/rapidsai/cudf(i use the step"for CUDA 10.1") but no luck. I can't use my gpu power. I have also reinstalled the ubuntu os and those drivers for many times. Anyone know how to…
Steve
  • 21
  • 6