The purpose of the code is similar to this post
I have a code that runs on CPUs:
import pandas as pd
def remove(s: pd.Series, thres:int):
pivot = -float("inf")
new_s = []
for e in s:
if (e-pivot)>thres:
new_s.append(e)
pivot=e
return pd.Series(new_s)
# s is an ascending sequence
s = pd.Series([0,1,2,4,6,9])
remove(s, thres=3)
# Out:
# 0 0
# 1 4
# 2 9
# dtype: int64
The input is an ascending sequence with integer values.
This function simply removes those points s[i]
where d(s[i], s[i-1]) < thres
My problem is that CuPy/cuDF do not support loops, so I can't use GPUs to accelerate the code. I only have options like cumsum
, diff
, and mod
that don't fit my needs.
Is there a function like scan
in tensorflow
?
The remove
function can be reformulated in a form that is similar to prefix sum (scan):
For a sequence [a1, a2, a3]
, the output should be [a1, a1⨁a2, (a1⨁a2)⨁a3]
, and ⨁
is equal to
⨁=lambda x,y: x if (y-x)>thres else y
Then set(output)
is what I want.
Note that (a1⨁a2)⨁a3 != a1⨁(a2⨁a3)
, in the absence of associative property, parallel computation might not be feasible.
Update
I found that there is already a function called Inclusive Scan, all I need is a python wrapper.
Or is there any other way?