0

I have streaming data coming from eventhub as follows:

product  price  quantity 
 1        55      100
 2        44      200
 2        43      200
 1        60      300
 2        55      100
 2        44      50
 2        47      100
 1        44      100
 1        48      100

Now I want to calculate the volume weighted average price grouped by product but always only of the last 500 units of traded volume (not of all values).

Calculating based on a time windows is straight forward but I want to only aggregate over the last 500 units of traded volume instead.

So I am looking for some kind of windowing function without time constraint. Is that existing or even possible with stream analytics?

josh66
  • 1

1 Answers1

0

The formula for weighted average is:

weighted_average = sum of (value * weight) / sum of weights

Since you have timestamp, you can use that field to compute the weighted average of price based on quantity column for each product with a window frame that includes all rows within the last 500 seconds. Below is the query

Query:

with Stgquery as(
select
product,  SUM(price)  OVER  (PARTITION  BY product LIMIT  DURATION  (second, 500))  as denom,
sum(cast(price as float)  *  cast(quantity as float))  over  (PARTITION  by product LIMIT  DURATION  (second, 500))  as num
from
input)
select product,weighted_average=(num/denom)  from Stgquery

In this query, CTE stgquery selects the product, the cumulative sum of the price column for each group of product values, and the sum of the product of the price and quantity columns for each group of product values ( using the SUM function with a window frame that includes all rows within the last 500 seconds. This can be changed as per requirement). The SELECT statement selects the product column and calculates the weighted average of the price column based on the quantity column for each group of product values. The weighted average is calculated by dividing the sum of the product of the price and quantity columns (stored in the num column of the Stgquery CTE) by the cumulative sum of the price column (stored in the denom column of the Stgquery CTE). The result is returned as a new column named weighted_average.

Output:

product weighted_average
1 100
2 200
2 200
1 204.3478
2 161.2676
2 134.9462
2 127.897
1 175.4717
1 157.971
Aswin
  • 4,090
  • 2
  • 4
  • 16
  • This is the vwap over the last 500 seconds but not what I am looking for. I want the vwap over the last 500 quantitites. – josh66 Jun 28 '23 at 07:49
  • 500 quantities -- the last row itself has 100 quanties. if you have 500 quanties in one row, should you consider that only? Could you explain – Aswin Jun 28 '23 at 08:45
  • yes, if you have eg. 500 quanties in one row this is the only record you should take into account for the vwap. as long as the sum of quantities is less than 500 take the vwap of all records. If a new event is coming in and the limit of 500 quantities is exceeded, remove the first event (regarding tms) from the group to calculate the vwap with the new record – josh66 Jun 28 '23 at 14:43
  • Have you tried any approach? do you have atleast partial working code? This could help me to understand the scenario better. – Aswin Jun 28 '23 at 15:01