Questions tagged [cudf]

Use this tag for questions specifically related to the cuDF Library, or cuDF DataFrame manipulations.

From PyPI: The RAPIDS cuDF library is a GPU DataFrame manipulation library based on Apache Arrow that accelerates loading, filtering, and manipulation of data for model training data preparation. The RAPIDS GPU DataFrame provides a pandas-like API that will be familiar to data scientists, so they can now build GPU-accelerated workflows more easily.

146 questions

vote

3 answers

cuDF for text / string

I am new to cuDF and may not have understood the purpose of construct so this is a very generic question that I have. I have a dataset that has mostly string columns and I was hoping to use apply_rows to perform the processing of the strings,…

attributeerror cudf

asked Mar 30 '20 at 13:52

Mayukh

vote

1 answer

Expected a bytes object, got a 'int' object erro with cudf

I have a pandas dataframe, all the columns are objects type. I am trying to convert it to cudf by typing cudf.from_pandas(df) but I have this error: ArrowTypeError: Expected a bytes object, got a 'int' object I don't understand why even that…

pandas dataframe conda cudf

asked Mar 10 '20 at 16:11

el abed houssem

vote

1 answer

`pip install cudf-cuda100` results in "ERROR: No matching distribution found for cudf-cuda100"

I run Windows 10 and have installed Anaconda. I am trying to install cudf but I repeatedly fail: (tf2) C:\WINDOWS\system32>pip install cudf-cuda100 ERROR: Could not find a version that satisfies the requirement cudf-cuda100 (from versions:…

python anaconda rapids cudf

asked Feb 07 '20 at 18:57

user8270077

4,621
17
75
140

vote

1 answer

How to install library in the google plat form - ai platform - notebook instance

I currently a data science undergraduate student and try to use google could platform - AI platform - notebook instance to do data science project. The following image shows what I am talking about. I have no problem running the instance and…

python google-cloud-platform anaconda jupyter-lab cudf

asked Oct 10 '19 at 08:40

Rui

vote

1 answer

How to ensure number of `partitions` is equally distributed across workers with dask and dask-cudf?

I am trying to do a basic ETL workflow on large files across workers using dask-cudf across a large amount of workers . Problem: Initially the scheduler schedules equal amounts of partitions to be read across workers but during the pre-processing…

dask cudf

asked Oct 04 '19 at 18:33

Vibhu Jawa

vote

1 answer

CUDF error processing a large number of parquet files

I have 2000 parquet files in a directory. Each parquet file is roughly 20MB in size. The compression used is SNAPPY. Each parquet file has rows that look like the following: +------------+-----------+-----------------+ | customerId | productId |…

python nvidia dask parquet cudf

asked Sep 26 '19 at 09:50

chochim

1,710
5
17
30

vote

1 answer

Convert cuDF data frame column to 1 or 0 for “true”/“false” values

I am using RAPIDS (0.9 release) docker container. How can I do the following with RAPIDS cuDF? df['new_column'] = df['column_name'] > condition df[['new_column']] *= 1

rapids cudf

asked Aug 22 '19 at 14:12

rnyai

vote

1 answer

How to use cudf.Series.applymap()?

Can someone please provide a few examples of how to use the applymap method on a cuDF Series? Below is copied from the docs and here is a link to the documentation. applymap(self, udf, out_dtype=None) Apply a elemenwise function to transform the…

series rapids cudf

asked Aug 13 '19 at 16:15

gumdropsteve

vote

3 answers

How to apply if condition in GPU DataFrame- cuDF to filter the DataFrame?

I'd like to filter a cuDF data frame based on a column value, and then create a new column based on a condition specified. Basically, how can I apply the following in cuDF? df.loc[df.column_name condition, 'new column name'] = 'value if condition is…

rapids cudf

asked Jul 27 '19 at 00:44

rnyai

vote

2 answers

How to drop columns with NA using cudf?

Pandas: data = data.dropna(axis = 'columns') I am trying to do something similar using a cudf dataframe but the apis don't offer this functionality. My solution is to convert to a pandas df, do the above command, then re-convert to a cudf. Is…

python rapids cudf

asked May 30 '19 at 16:37

Sterls

votes

1 answer

error of memory leakage on dask when running a job on multiple GPUs

I would like to process some textual data with “sentence-transformers” (generated embeddings for textual data) on multiple GPUs (2 T4, 15 GB per GPU) and 16 vCPUs (with 60 GB RAM) on GCP from Jupyter notebook. The data size is not large but the…

python dask dask-distributed dask-dataframe cudf

asked Jun 26 '23 at 23:45

mtnt

votes

2 answers

error of accessing an attribute of dask_cudf Series data structure when it is called from a user defined function

My question is relevant to my previous one at Error of using parallelizing data processing by "sentence_transformers" on 2 GPUs from Jupyter notebook. I have tried a new solution because I got an error for the proposed one. I would like to use…

python dask dask-distributed dask-dataframe cudf

asked Jun 21 '23 at 23:01

mtnt

votes

0 answers

error of adding a new column to dask cudf data frame from a 2-d numpy.darray

I would like to assign a new column to a dask cudf data frame from Jupyter notebook. The new column is a 2-dimension numpy.ndarray. My code: import cudf import dask_cudf import numpy as np from random import random df = cudf.DataFrame( { …

numpy dask numpy-ndarray dask-dataframe cudf

asked Jun 14 '23 at 23:59

mtnt

votes

1 answer

Troubleshooting cudf.tokenize(): 'Length Mismatch' error with non-space delimiters

Cudf Tokenize Element Length Mismatch This is the expected result for tokenize(' ') on space character: 0 Due 0 to 0 being 0 on 0 FMLA …

python rapids cudf

asked May 20 '23 at 01:53

Using_System

votes

0 answers

calculating dispersion_norm using CUDF

I've been working on building a gpu accelerated package based on Scanpy using the CUDA toolkit ( cudf=23.02, cuml=23.02 ,cugraph=23.02 cudatoolkit=11.8). I'm currently implementing the highly variable genes function but I'm running into some strange…

statistics bioinformatics cudf scanpy

asked Apr 03 '23 at 14:54

Wesley Rademaker

Prev 1 2 3

…

9 10 Next