I am new to cuDF and may not have understood the purpose of construct so this is a very generic question that I have. I have a dataset that has mostly string columns and I was hoping to use apply_rows to perform the processing of the strings, however, I realized that this may only work with numeric data.
Here is an example that I quoted in most sites:
import cudf
import numpy as np
df = cudf.DataFrame()
nelem = 3
df['col1'] = np.arange(nelem)
df['col2'] = np.arange(nelem)
df['col3'] = np.arange(nelem)
# Define input columns for the kernel
col1 = df['col1']
col2 = df['col2']
col3 = df['col3']
def kernel(col1, col2, col3, out1, out2, kwarg1, kwarg2):
for i, (x, y, z) in enumerate(zip(col1, col2, col3)):
out1[i] = kwarg2 * x - kwarg1 * y
out2[i] = y - kwarg1 * z
df.apply_rows(kernel,
incols=['col1', 'col2', 'col3'],
outcols=dict(out1=np.float64),
kwargs=dict(kwarg1=3, kwarg2=4))
If I change this to
import cudf
import numpy as np
df = cudf.DataFrame()
nelem = 3
df['col1'] = np.arange(nelem)
df['col2'] = np.arange(nelem)
df['col3'] = ['a','a','a'] # <<- change to string
# Define input columns for the kernel
col1 = df['col1']
col2 = df['col2']
col3 = df['col3']
def kernel(col1, col2, col3, out1, out2, kwarg1, kwarg2):
for i, (x, y, z) in enumerate(zip(col1, col2, col3)):
out1[i] = kwarg2 * x - kwarg1 * y
out2[i] = y - kwarg1 * z
It reports an error like AttributeError: 'nvstrings' object has no attribute 'to_gpu_array'.
Is this designed to work only with numerical values? I am assuming this is designed to work on matrix type operations which is why this constraint. Can someone provide some insights here?