1

Let's say I a table that contains the date, account, product, purchase type, and amount like below: Sample Table

Looking at this table, you can see that for any particular account/product combination, there are buys and sells. Essentially, what I'd like to write is a SQL query that flags the following: Are there accounts that bought at a certain amount and then sold the same aggregate amount or more 30 days from that buy?

So for example, we can see account 1 bought product A for 20k on 8/1. If we look at the running sum of sells by account 1 for product A over the next 30 days, we see they sold a total of 20k - the same as the initial buy:

enter image description here

Ideally, the query would return results that flag all of these instances: for each individual buy, find all sells for that product/account 30 days from that buy, and only return rows where the running total of sells is greater than or equal to that initial buy.

EDIT: Using the sample data provided, the desired should look more or less look like the following:

enter image description here

You'll see that the buy on 8/2 for product B/account 2 is not returned because the running sum of sells for that product/account/buy combination over the next 30 days does not equal or exceed the buy amount of 35k but it does return rows for the buy on 8/3 for product B/ account 2 because the sells do exceed the buy amount of 10k.

I know I need to self join the sells against the buys, where the accounts/products equal and the datediff is less than or equal 30 and I basically have that part structured. What I can't seem to get working is the running total part and only returning data when that total is greater than or equal to that buy. I know I likely need to use the over/partition by clauses for the running sum but I'm struggling to produce the right results/optimize properly. Any help on this would be greatly appreciated - just looking for some general direction on how to approach this.

Bonus: Would be even more powerful to stop returning the sells once the running total passes the buy, so for example, the last two rows in the desired output I provided aren't technically needed - since the first two sells following the buy had already eclipsed the buy amount.

Dale K
  • 25,246
  • 15
  • 42
  • 71
nunga
  • 31
  • 5
  • What if there is another "buy" within the 30 days period? – GMB Sep 08 '20 at 22:31
  • @GMB the query would aim to look at each individual buy, so if there are multiple buys for the same account/product, we would just be looking at the sells 30 days from each one (i.e if a buy happened 9/1, we'd look at the corresponding sells for that account/product up until 9/2 and see if the aggregate exceeds the buy on 9/1. If the same account/product bought on 9/2, we'd do the same thing, up until 10/2 or whatever 30 days from that date would be. – nunga Sep 08 '20 at 22:37
  • @DaleK - apologies on etiquette, I don't use this very often – nunga Sep 08 '20 at 22:38
  • 3
    Don't apologise - just update/improve your question :) – Dale K Sep 08 '20 at 22:42

1 Answers1

1

In SQL Server, one option uses a lateral join:

select 
    t.*, 
    case when t.amount = x.amount then 1 else 0 end as is_returned
from mytable t
cross apply (
    select sum(amount) amount
    from mytable t1
    where 
        t1.purchase_type = 'Sell'
        and t1.account = t.account 
        and t1.product = t.product
        and t1.date >= t.date
        and t1.date <= dateadd(day, 30, t.date)
) x
where t.purchase_type = 'Buy'

The lateral join sums the amount of "sells" of the same account and product within the following 30 days, which you can then compare with the amount of the buy. The query gives you one row per buy, with a boolean flag that indicates if the amounts match.

In databases that support the range specification to window functions, this would be more efficiently expressed with a window sum:

select *
from (
    select 
        t.*,
        case when amount = sum(case when purchase_type = 'Sell' then amount end) over(
            partition by account, product
            order by date
            range between current row and interval '30' day following
        ) then 1 else 0 end as is_returned
    from mytable t
) t
where purchase_type = 'Buy'

Edit: this would generate a resultset similar to the third table in your question:

select t.*, x.*
from mytable t
cross apply (
    select 
        t1.date sale_date, 
        t1.amount sell_amount, 
        sum(t1.amount) over(order by t1.date) running_sell_amount,
        sum(t1.amount) over() total_sell_amount
    from mytable t1
    where 
        t1.purchase_type = 'Sell'
        and t1.account = t.account 
        and t1.product = t.product
        and t1.date >= t.date
        and t1.date <=  dateadd(day, 30, t.date)
) x
where t.purchase_type = 'Buy' and t.amount = x.total_sell_amount
GMB
  • 216,147
  • 25
  • 84
  • 135
  • thanks! that was quick! let me test this in a few. However, I think for this use case, I'd want to see each individual sell similar to the desired ouptut - not just one row per buy. How would that alter the above query? – nunga Sep 08 '20 at 22:44
  • 2
    @nunga: you did not show the result that you want, so this was my understanding. You might need to [edit your question](https://stackoverflow.com/posts/63802306/edit) to display the expected result. – GMB Sep 08 '20 at 22:46
  • it is the third table in the post. Sorry, I'm not very good at structuring these. Ideally, the query would return one row per buy/sell combination only for sells that once aggregated, exceed or are equal to the initial buy (30 days from that buy). I'll try to edit the question in a bit to improve – nunga Sep 08 '20 at 22:49
  • @nunga: ok, I understand. It was not obvious to me that this was the desired results. See the edit in my answer. – GMB Sep 08 '20 at 22:57
  • Thanks @GMB - yea it wasn't clear, apologies. i'm about to test this, however, doesn't it also need a condition to only return rows where the running total is equal to or greater than the initial buy amount? – nunga Sep 08 '20 at 23:19
  • @nunga: fixed in the query. – GMB Sep 08 '20 at 23:30
  • thank you, this is really close but doesn't seem to work, I think the running sum is slightly off. the running sum should restart for each buy/product/account combination - since we are only looking to flag where aggregate sells are equal or greater than each buy for the same account/product. I think maybe a partition clause with product/account/buy date would work? – nunga Sep 08 '20 at 23:36
  • @nunga: for the sample data that you showed, this seems to produce the results you showed. If you want to refine your requirement, then you might consider asking [a new question](https://stackoverflow.com/questions/ask). – GMB Sep 08 '20 at 23:44
  • ok thanks, however it does not produce the same results. In the desired output i provided, the running sum is per buy/product/account combo. Also, the cross apply seems to be extremely performance heavy - i only selected the top 10 and its on 15 min so far. I think I can use what you have as a foundation. I will reply if I figure it out- thank you so much for your help! – nunga Sep 08 '20 at 23:57