1

I'm trying to do a sort of running total with a variable window size. I want to sum a value from the current row to the last time another row met a condition.

Mock Data: I want a formula for the last column

ID condition_col col_to_sum running_total_since_last_true
01 TRUE 10 10
02 FALSE 8 18
03 FALSE 12 30
04 TRUE 5 5
05 FALSE 8 13
06 FALSE 10 23
07 FALSE -2 21
08 TRUE 1 1
09 FALSE 1 2

What I have tried:

select ID
     , condition_col
     , col_to_sum
     , (
        select sum(col_to_sum) as total
          from tbl t1
         where t1.ID between tbl.ID and (select top 1 t2.ID 
                                           from tbl t2 
                                          where condition_col = TRUE 
                                       order by t2.ID))
        ) as running_total_since_last_true
  from tbl

Because of the size of the dataset and the other things going on in the query (not shown here) this resulted in the query taking several minutes to run for a few thousand rows.

Now I'm looking at using sum() over(rows between current row and (select...)

I feel like I'm on the right track, but I keep getting stuck.

I also tried lag() but I couldn't make the lag reference itself properly.

Any ideas?

2 Answers2

1

You would be better off putting your data into groups first and then using a cumulative SUM. You can put your data into groups by using a conditional cumulative COUNT (inside a CTE):

WITH Grps AS(
    SELECT ID,
           condition_col,
           col_to_sum,
           COUNT(CASE condition_col WHEN 'TRUE' THEN 1 END) OVER (ORDER BY ID ASC
                                                                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
    FROM (VALUES(01,'TRUE',10,10),
                (02,'FALSE',8,18),
                (03,'FALSE',12,30),
                (04,'TRUE',5,5),
                (05,'FALSE',8,13),
                (06,'FALSE',10,23),
                (07,'FALSE',-2,21),
                (08,'TRUE',1,1),
                (09,'FALSE',1,2))V(ID,condition_col,col_to_sum,running_total_since_last_true))
SELECT ID,
       condition_col,
       col_to_sum,
       SUM(col_to_sum) OVER (PARTITION BY Grp ORDER BY ID
                             ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total_since_last_true
FROM Grps;
Thom A
  • 88,727
  • 11
  • 45
  • 75
  • Perfect. Not only did this solve the problem, but the total runtime for my query actually dropped somehow (compared to when I didn't include this column at all.) Thank You! – Glen Hamblin Jul 20 '23 at 15:51
1

Heres some example DDL/DML:

DECLARE @Table TABLE (ID INT, condition_col BIT, col_to_sum INT);
INSERT INTO @Table (ID, condition_col, col_to_sum) VALUES 
(01, 1, 10), (02, 0, 8), (03, 0, 12), (04, 1, 5), (05, 0, 8), (06, 0, 10), (07, 0, -2), (08, 0, 1), (09, 0, 1);

It makes it much easier for folks to answer questions when we already have this.

This is a pretty typical problem, one way to solve it is using windowed functions:

SELECT ID, condition_col, col_to_sum, SUM(Col_To_Sum) OVER (PARTITION BY Cnt ORDER BY ID)
  FROM (
        SELECT ID, condition_col, col_to_sum, COUNT(CASE WHEN condition_col = 1 THEN 1 END) OVER (ORDER BY ID) AS cnt
          FROM @Table
       ) a
 ORDER BY id

Here we're using a windowed function to group the rows by an pseudo-ID, and then using that as the partition for the sum.

ID condition_col col_to_sum (No column name)
1 1 10 10
2 0 8 18
3 0 12 30
4 1 5 5
5 0 8 13
6 0 10 23
7 0 -2 21
8 0 1 22
9 0 1 23
Patrick Hurst
  • 2,086
  • 1
  • 3
  • 13
  • Thank You! If I could check two solutions, I would. I ended up going with @Thom A 's solution because I was already working with WITH statements, and the real query was quite large and complex. Re: DDL/DML Thanks for the Tip. Will do in the future. – Glen Hamblin Jul 20 '23 at 15:54