2

I'm using Postresql 9.6 and am trying to figure out how to create a query to calculate the sum of a column based on the discrete sum of a different column (both in the same table). For example, I want the sum of counts for each set of rows until the volume >= 100 and then to start calculating the sum of counts over again.

Example data:

id          count       volume
1           5           12
2           6           14
3           2           11
4           10          9
5           5           14
6           17          19
7           0           8
8           12          4
9           18          6
10          12          14
11          10          10
12          15          7
13          8           12
14          2           17
15          5           30
16          9           24
17          2           16.5

Hypothetical intermediate results (just to understand what I expect):

id         sum(count)  discrete volume
1          5           12
2          11          26
3          13          37
4          23          46
5          28          60
6          45          79          
7          45          87
8          57          91
9          75          97
10         87          111
11         10          10   (reset since volume >= 100)
12         25          17
13         33          29
14         35          46
15         40          76
16         49          100
17         2           16.5   (reset since volume >= 100)

Expected final results:

sum(count) discrete volume
87         111
49         100
2          16.5   (partial result, which is desired)

What I have tried so far: SQL Fiddle

I got idea for the current query shown in SQL Fiddle from another StackOverflow answer and this obviously isn't working correctly (however it is working quite nicely for a different problem I was trying to solve). I saw from a different StackOverflow answer that I probably want to be using a RECURSIVE query (PostgreSQL Documentation) but I can't figure out exactly how to write the query correctly so it works :(

I certainly can write this to work in Java code (and already have), but I want to do it in SQL so it is hopefully faster than reading all the rows and calculating the results. I can also most likely write a stored procedure to make this work, but I would rather like to avoid that as I'm using JPA (in this case with a native query) and want to keep all the code in the same place (the Java codebase). Also, I want to be able to dynamically include/exclude columns, so I would like to construct the query in Java code (the actual table has many more columns than my example).

Thanks for any help you can give.

EDIT:

Thanks to @klin's comment and looking at the referenced StackOverflow issue, I'm getting closer. Here is what I have (generates intermediate results):

WITH RECURSIVE WorkTable(id, count_sum, volume_sum) AS 
(
    SELECT 
        id,
        count AS count_sum,
        volume AS volume_sum
    FROM measurements
    WHERE id = 1
    UNION ALL
    SELECT
        measurements.id,
        CASE WHEN WorkTable.volume_sum >= 100
            THEN measurements.count
            ELSE WorkTable.count_sum + measurements.count
        END AS count_sum,
        CASE
            WHEN WorkTable.volume_sum >= 100
            THEN measurements.volume
            ELSE WorkTable.volume_sum + measurements.volume
        END AS discrete_sum_volume
        FROM measurements
        JOIN WorkTable
        ON measurements.id = WorkTable.id + 1
)
SELECT *
FROM WorkTable
ORDER BY id

However, what I'm still missing is how to get the final result. If I use WHERE volume_sum >= 100 I don't get the final (partial) result. And I can't use OR id = MAX(id) because Postgres doesn't allow that in a WHERE clause.

EDIT: BTW after all the work and time that @klin and @JorgeCampos spent looking at and answering my question, I found that this query doesn't perform for a table with millions of rows. I have created a stored procedure instead; I didn't want to have to go there but there seems to be no other well performing alternative. The stored procedure out-performs the RECURSIVE query by many orders of magnitude for a large table.

  • You mean `until volume <= 100` right? – Jorge Campos Apr 20 '18 at 17:44
  • @JorgeCampos - I tried editing the question based on your question and it seemed to read much less clearly. The reset condition is `volume >= 100` – Sometimes_Confused Apr 20 '18 at 17:55
  • There are two approaches to similar problems - iterative and recursive. Examples on both can be found in this thread [Find Value per number based on Above row result.](https://stackoverflow.com/a/49640488/1995738) – klin Apr 20 '18 at 18:19
  • @klin thanks for your response. the iterative approach seems to require a stored procedure which I'm trying to avoid (for the reason see the text of the question). I would like a recursive solution if it is possible (I'm not sure). I've been hacking on the recursion query for almost a solid day now, but so far nothing I try is working. – Sometimes_Confused Apr 20 '18 at 18:47
  • For your current attempt the solution would be to create a row number based on that case statement and get only the rows with number=1, however you are assuming for your query that there is no gap on the ids and I particularly disagree with it. Any delete or manual insert on that table would break your functionality entirely. – Jorge Campos Apr 20 '18 at 19:30
  • @JorgeCampos I agree completely with your comments about sequential identifiers, that will be a real problem in my case since I'm not using sequences and have other identifiers that I will be using to restrict the query. However, I was just trying to solve the simple problem before solving the more complex one :) Any suggestions on how to solve for the general case would be MOST welcome! – Sometimes_Confused Apr 20 '18 at 19:32
  • Yeah, I'm trying as well. I'm using window functions, maybe your solution with mine can lead to the final? Take a look: http://sqlfiddle.com/#!17/dc63a/80 – Jorge Campos Apr 20 '18 at 19:38
  • Andd I renamed the count column to countt since count is a reserved word – Jorge Campos Apr 20 '18 at 19:39

1 Answers1

3

I have used row_count() as consecutive row numbers are needed. You should not rather rely on the primary key, because it usually may contain gaps.

with recursive cte as (
    select *, row_number() over (order by id)
    from measurements
),
work_table as (
    select 
        id,
        count as count_sum,
        volume as volume_sum,
        row_number
    from cte
    where row_number = 1
    union all
    select
        c.id,
        case when w.volume_sum >= 100
            then c.count
            else w.count_sum + c.count
        end as count_sum,
        case
            when w.volume_sum >= 100
            then c.volume
            else w.volume_sum + c.volume
        end as discrete_sum_volume,
        c.row_number
    from cte c
    join work_table w
    on c.row_number = w.row_number + 1
)
select count_sum, volume_sum
from work_table
where volume_sum >= 100
or id = (select max(id) from work_table)
order by id

Result:

 count_sum | volume_sum 
-----------+------------
        87 |        111
        49 |        100
         2 |       16.5
(3 rows)

SqlFiddle.

klin
  • 112,967
  • 15
  • 204
  • 232