2

I would like to write a function that the following:

raw data:

customer product revenue
Customer A Product 1 EUR 10
Customer A Product 2 EUR 10
Customer B Product 1 EUR 5
Customer B Product 2 EUR 2
Customer C Product 1 EUR 5

target data:

customer revenue cumulative revenue
Customer A EUR 20 EUR 20
Customer B EUR 7 EUR 27
Customer C EUR 5 EUR 32

I know exactly how to do that in PySpark but is unfamiliar with typescript with function, as I want to trigger 'on the fly" calculation on the front end.

Here is the PySpark code:

from pyspark.sql import functions as F, window as W
window = W.Window.partitionBy(F.col("helper")).orderBy(F.col("net_revenue").desc())
df = ( df .groupby("customer") .agg( F.sum("net_revenue").alias("net_revenue") ) .withColumn('helper', F.lit(1)) .withColumn( "cumulative_revenue", F.sum("net_revenue").over(window) ) )

Can you please advise how I can write that piece of function code?

ZygD
  • 22,092
  • 39
  • 79
  • 102
MCHE
  • 23
  • 3
  • 1
    Hi, a lot of the folks from Palantir that regularly answer these questions are on hollidays till EOY. If no one from the community answers till we're back, we'll drop you an answer then. A good way for you to avoid getting your question closed (I can see one vote to close as "needs more focus) is to provide the code in pyspark that you would like to see in functions. – fmsf Dec 22 '21 at 16:22
  • @fmsf: thanks - here is the PySpark code ''' from pyspark.sql import functions as F, window as W window = W.Window.partitionBy(F.col("helper")).orderBy(F.col("net_revenue").desc()) df = ( df .groupby("customer") .agg( F.sum("net_revenue").alias("net_revenue") ) .withColumn('helper', F.lit(1)) .withColumn( "cumulative_revenue", F.sum("net_revenue").over(window) ) ) ''' – MCHE Dec 26 '21 at 19:49

1 Answers1

3

Functions currently doesn't support a quick and concise way for this workflow. The .groupBy(), .sum(), and .segmentBy() functions won't help you with exactly what you are trying to do.

In addition, we purposefully limit the number of grouped results on a string (in your case, customer) to 1000.

The best way for you to tackle this is to iterate through each row of the raw data, creating a map keeping track of the custom to its revenue and cumulative revenue.

After that (if I'm understanding what you want to do correctly), you can edit the object that each of those calculated revenues refer to with something like data.revenue = newRevenue.

For a code sample to give you an idea of how you could do this:

interface CustomerRevenueData {
    revenue: Integer;
    cumulativeRevenue: Integer;
}

@Function()
public cumulativeRevenueAnalysis(): void {
    const data = Objects().search().initialDataObject().all();

    // The string key represents the customer name
    const customRevenueData = new FunctionsMap<string, CustomerRevenueData>();

   data.forEach(dataPoint => {
       // your logic here, updating the revenue/cumulative revenue for each customer you find
   })
}
  • Hi, thanks for the code. I can figure out the code based on what you suggested above. However, my current data has >1million row and apparently doing a forEach here would lead to time out in Functions on Foundry. Am I reaching the edge of the platform? – MCHE Jan 08 '22 at 15:03
  • It depends. Which context are you using this function? Is there any way you can calculate your results, say, 200 objects at a time? – Ryan Arifin Jan 18 '22 at 11:35
  • building on @RyanArifin's suggestion, 3 typical ways of achieving this: 1. Drastically reduce the size of the base object set you're aggregating by letting the user select specific segments of data (e.g. time periods/markets/customer categories) first before applying the function. 2. Create an additional object type that's already aggregated to some extent per a data transform and then apply the function to this object type. 3. Let users precalculate some aggregations using a first function and capture the results in variables. Then use these variables as inputs to a second function. – Benjamin Ahnert Jan 18 '22 at 12:00
  • the overall context is that we have let's say 5M customers in the world, and we want to be able to say the top x% customers account for x% revenue. I am looking into the following capabilities - 1. users filter for the customers by various dimensions, maybe down to 1 M, 2. users view a chart where x axis is the percentile of the customer by descending revenue, y-axis as their cumulative revenue based on their selection dynamically. 3. Based on that chart users should be able to filter for different percentile on the chart, and view the table of customers below dynamically. – MCHE Jan 19 '22 at 13:56