0

It's probably pointless posting this as I can't show the actual code causing the problem, and due to not knowing the source of the issue have not been able to recreate it using fake data, but wanted to ask in case anyone has seen something similar.

I have a query that has a SUM windows function that returns different values each time it's run, despite no changes in the code or underlying data. Literally the same statement run two seconds apart will return a seemingly random selection from 4 to 7.

I can't print the actual code but the line that produces the varying results is as follows, where the column names have been changed: -

SUM(UNITS) OVER (PARTITION BY ACCOUNT_KEY, TRANSACTION_DATE 
    ORDER BY ACCOUNT_KEY,TRANSACTION_DATE)  NUMBER_OF_UNITS      

If anyone can please offer any suggestions on how to approach this or if you've had a similar issue, please let me know. I realise that without the full code it's difficult-to-impossible to know for sure, but there's a chance that someone has seen a similar problem and might be offer some guidance.

Simeon Pilgrim
  • 22,906
  • 3
  • 32
  • 45

1 Answers1

3

Your ORDER BY keys are the same as your PARTITION BY keys. So, there is no ordering -- everything in a window frame has the same sort key values.

You are doing a cumulative sum on an unstable sort. So the rows can be -- and are -- in any arbitrary order.

Presumably, you want a cumulative sum of UNITS for each account, order by the date:

SUM(UNITS) OVER (PARTITION BY ACCOUNT_KEY ORDER BY TRANSACTION_DATE) as NUMBER_OF_UNITS

Assuming that TRANSACTION_DATE is unique, this will be stable. Note that by default, equal transaction dates for an account as all appearing together because the default windowing clause is RANGE:

SUM(UNITS) OVER (PARTITION BY ACCOUNT_KEY
                 ORDER BY TRANSACTION_DATE
                 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
                ) as NUMBER_OF_UNITS
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks for this reply. I didn't know (or forgot) that adding an order by to this made it cumulative - this is very useful to bear in mind which I'll try to do.(+1) Unfortunately it doesn't seem to be the cause. When I run the query with no SUM functions and only the ACCOUNT_KEY then sometimes it produces 7 rows, other times 3. Basically the issue seems to be on one of the JOINs so my original question isn't as useful as hoped. – SnakePliskin Mar 30 '21 at 10:59
  • 1
    @SnakePliskin . . . This answers the question that you asked here. If you have a question about a `JOIN`, then ask a new question. – Gordon Linoff Mar 30 '21 at 12:17
  • Thanks @Gordon Linoff. This was very helpful. Although the SUM function doesn't return varying row counts due to this issue - varying numbers of rows are returned even without the SUM function - it does point the way in how to approach this, given that some of the joins appear to employ a similar logic. Nevertheless,this SUM function will still need to be revised since it might not work properly - as you correctly say - even with a stable number of rows. – SnakePliskin Apr 01 '21 at 07:36