Using CuPy/cuDF, remove elements that are not distant enough to their previous elements from a sorted list

Question

The purpose of the code is similar to this post

I have a code that runs on CPUs:

import pandas as pd


def remove(s: pd.Series, thres:int):
    pivot = -float("inf")
    new_s = []
    for e in s:
        if (e-pivot)>thres:
            new_s.append(e)
            pivot=e
    return pd.Series(new_s)

# s is an ascending sequence
s = pd.Series([0,1,2,4,6,9])
remove(s, thres=3)
# Out:
# 0    0
# 1    4
# 2    9
# dtype: int64

The input is an ascending sequence with integer values.
This function simply removes those points s[i] where d(s[i], s[i-1]) < thres

My problem is that CuPy/cuDF do not support loops, so I can't use GPUs to accelerate the code. I only have options like cumsum, diff, and mod that don't fit my needs.

Is there a function like scan in tensorflow?

The remove function can be reformulated in a form that is similar to prefix sum (scan):

For a sequence [a1, a2, a3], the output should be [a1, a1⨁a2, (a1⨁a2)⨁a3], and ⨁ is equal to

⨁=lambda x,y: x if (y-x)>thres else y

Then set(output) is what I want.

Note that (a1⨁a2)⨁a3 != a1⨁(a2⨁a3), in the absence of associative property, parallel computation might not be feasible.

Update

I found that there is already a function called Inclusive Scan, all I need is a python wrapper.

Or is there any other way?

Using CuPy/cuDF, remove elements that are not distant enough to their previous elements from a sorted list

0 Answers0