I tried to use dfply package to create an accumulator column given a condition, but failed with customized function.
Using the diamonds data as an example: I'd like to create an accumulator column such that if price is larger than 500, then +1, else +0.
My code is following:
import panda as pd
from dfply import *
@make_symbolic
def accu(s, threshold):
cur = 0
res = []
for x in s:
if x > threshold:
cur += 1
res += [cur]
return pd.Series(res)
(diamonds >>
mask(X.color == 'D', X.cut == 'Premium', X.carat > 0.32) >>
mutate(row_id = row_number(X.price), # Get the row number
accu_id = accu(X.price, 500)) >> # Get the accumulator, this step failed
arrange(X.row_id) >>
head(10)
)
Expect output will look like the following:
price row_id accu_id
498 1 0
499 2 0
501 3 1
502 4 2
400 5 2
503 6 3