1

I have a table

create table emp 
(
    id,  int,
    salary int
)

I need to return all rows having a rolling sum less than a given input X order by id.

Query I have thought of is:

Select * 
From 
    (select id, salary, sum(salary) over (order by id ASC) as rollingSum 
     from emp) as temp_view 
where rollingSum < X

It is able to provide correctly what I need.

But subquery will do a rolling sum till end even though sum is achieved. All the rows till end will be searched .

Is there a way we can stop the rolling sum as soon as the sum is achieved?

pvjhs
  • 549
  • 1
  • 9
  • 24

1 Answers1

0

I'm reasonably confident this isn't possible because of the order by in your statement. There is no way to know what the top entries are until you've searched through all of them. If there is an index on that column, maybe it wouldn't need to search through all of them. At that point though, you wouldn't want to interrupt a single continuous read from that table to do row by row processing. There are a lot of features in SQL Server though, so it's entirely possible I'm missing something.

You might also be able to do something if you save the rolling sum results in a column, but I'm guessing that you want to stop processing to save on performance cost and that solution probably isn't going to help. If the data only changes once a month and the select is run 100 times a month, maybe the upfront cost could become worthwhile.

  • This is just a dummy example. Real data is highly dynamic and need this query to run 10-50K times a day. So a separate rolling sum column wont be possible. – pvjhs Aug 22 '19 at 17:36
  • " because of the order by in your statement. There is no way to know what the top entries are until you've searched through all of them" But sum must be maintained somewhere locally going in a particular order against each row. So I thought it may be possible. – pvjhs Aug 22 '19 at 17:38
  • I figured it probably was highly dynamic, frequently used data and probably wouldn't work. Otherwise you wouldn't be bothering to make it more performant. The sum should be generated by iterating over the ordered rows, and it can't do that until the table is completed. Doing any kind of OVER operation is significantly slower than simply retrieving the data, so you wouldn't want to stop and start the data retrieval to perform a new OVER operation after each row. – Cody McPheron Aug 22 '19 at 17:55
  • But if id has a B-Tree index on which we are doing OVER BY. A whole table scan wont not be required. I Believe you mentioned the same. – pvjhs Aug 22 '19 at 18:28
  • I agree that it's possible that avoiding a whole table scan would be possible, but the index isn't related to the value you are calculating here. If you needed the top 50 records order by column_a and it's indexed on column_a, then you shouldn't need to scan past those 50 records. In this case though, you have a generated value and I can't imagine that that the optimizer will stop doing a relatively fast operation like an index seek to start the much slower row comparison operation. – Cody McPheron Aug 22 '19 at 20:35
  • Perhaps consider selecting your results into a temp table first if you have an absolutely massive table? If 99% of operations finish with just the first 5000 rows and you have 5bil rows in the table, then select against a 5000 row teno table by default and fall back to the slower version when the sum isn't reached with that temp table. It feels a little hacky to do that admittedly, but it's easy enough to implement and explain to others. – Cody McPheron Aug 22 '19 at 20:41