I would like to write a function that the following:
raw data:
customer | product | revenue |
---|---|---|
Customer A | Product 1 | EUR 10 |
Customer A | Product 2 | EUR 10 |
Customer B | Product 1 | EUR 5 |
Customer B | Product 2 | EUR 2 |
Customer C | Product 1 | EUR 5 |
target data:
customer | revenue | cumulative revenue |
---|---|---|
Customer A | EUR 20 | EUR 20 |
Customer B | EUR 7 | EUR 27 |
Customer C | EUR 5 | EUR 32 |
I know exactly how to do that in PySpark but is unfamiliar with typescript with function, as I want to trigger 'on the fly" calculation on the front end.
Here is the PySpark code:
from pyspark.sql import functions as F, window as W
window = W.Window.partitionBy(F.col("helper")).orderBy(F.col("net_revenue").desc())
df = ( df .groupby("customer") .agg( F.sum("net_revenue").alias("net_revenue") ) .withColumn('helper', F.lit(1)) .withColumn( "cumulative_revenue", F.sum("net_revenue").over(window) ) )
Can you please advise how I can write that piece of function code?