0

I tried to use dfply package to create an accumulator column given a condition, but failed with customized function.

Using the diamonds data as an example: I'd like to create an accumulator column such that if price is larger than 500, then +1, else +0.

My code is following:

import panda as pd
from dfply import *

@make_symbolic
def accu(s, threshold):
    cur = 0
    res = []
    for x in s:
        if x > threshold:
            cur += 1
        res += [cur]
    return pd.Series(res)


(diamonds >> 
 mask(X.color == 'D', X.cut == 'Premium', X.carat > 0.32) >>
 mutate(row_id = row_number(X.price),        # Get the row number
        accu_id = accu(X.price, 500)) >>     # Get the accumulator, this step failed
 arrange(X.row_id) >>
 head(10)
)

Expect output will look like the following:

price row_id accu_id
498   1      0
499   2      0
501   3      1
502   4      2
400   5      2
503   6      3

0 Answers0