I want to implement padding operation to each list after collecting with groupby operation.
The conceptual implementation is like this:
df = cudf.DataFrame({"g": [1, 1, 1, 2, 2, 3], "a": [1, 2, 3, 1, 3, 1]})
df.groupby("g")["a"].collect().list.pad(max_length=3, pad_left=True, drop="last", padding_value=-1)
expected output:
g
1 [1, 2, 3]
2 [-1, 1, 3]
3 [-1, -1, 1]
How to do this?
After converting pandas dataframe and applying `np.pad` operation worked, but it seems a bit awkward and slow. Are there any way to do it in cuDF/cuPy?
cudf.from_pandas(
df.groupby("g")["a"]
.collect()
.to_pandas()
.apply(lambda x: np.pad(x, (max(3 - len(x), 0), 0), constant_values=(-1,)))
)
c.f. .apply()
function to cuDF series of list type raises NumbaNotImplementedError
.
NumbaNotImplementedError: list
df = cudf.DataFrame({"g": [1, 1, 1, 2, 2, 3], "a": [1, 2, 3, 1, 3, 1]})
df.groupby("g")["a"].collect().apply(
lambda x: np.pad(x, (max(3 - len(x), 0), 0), constant_values=(-1,))
)