3

How is it possible to return rows with an accumulate sum for a row bigger or smaller than a specified value?

table:

id | count
-----------
1 | 30
2 | 10
3 | 5
4 | 20
5 | 15

query:

SELECT id, count
FROM table
ORDER BY id
HAVING SUM(count) < 50

return rows:

id | count
-------------
1 | 30
2 | 10
3 | 5

update

code:

public function query(){
    switch($this->table){
        case 'in_stock':
            return "SELECT * FROM ".Init::$static['db'].".stock
                WHERE id<=dynaccount.stock_first_available_id(".$this->value['num_stock'].", ".$this->value['product_id'].", ".(isset($this->att_arr['gid']) ? $this->att_arr['gid']:$_SESSION['gid']).")
                ORDER BY time, id";
    }
}

procedure:

DELIMITER $$

DROP FUNCTION IF EXISTS `stock_first_available_id` $$
CREATE DEFINER=`dynaccount`@`localhost` FUNCTION `stock_first_available_id`(_running_total_limit INT, _product_id INT, _group_id INT) RETURNS INT
BEGIN
    DECLARE _running_count INT default 0;
    DECLARE _id INT;
    DECLARE _current_id INT;
    DECLARE _sum_count INT;

    IF (SELECT COUNT(*) FROM stock WHERE group_id=_group_id && type=2 && product_id=_product_id) = 0 THEN
        RETURN 0;
    END IF;

    DECLARE _cur CURSOR FOR SELECT id, count FROM stock WHERE group_id=_group_id && type=2 && product_id=_product_id ORDER BY time DESC, id DESC;

    OPEN _cur;

    read_loop: LOOP
        FETCH _cur INTO _id, _sum_count;

        SET _running_count = _running_count + _sum_count;
        SET _current_id = _id;

        IF _running_count > _running_total_limit THEN
            LEAVE read_loop;
        END IF;
    END LOOP read_loop;

    CLOSE _cur;

    RETURN _current_id;
END $$

DELIMITER ;

error:

#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'DECLARE _cur CURSOR FOR SELECT id, count FROM stock WHERE group_id=_group_id &amp;&amp; ' at line 12 
clarkk
  • 27,151
  • 72
  • 200
  • 340
  • @mu, what do you mean? isn't it possible to do in mysql? – clarkk Jan 01 '12 at 00:23
  • Take a look at http://stackoverflow.com/questions/1135627/mysql-select-accumulated-column. You could then select all rows where the accumulated sum is less than the total. – dash Jan 01 '12 at 00:25
  • Calculating running totals for large amounts of rows is inefficient in SQL generally (unless they support the full `OVER` clause). I have a feeling there might be a way of doing this requirement reasonably efficiently in MySQL using user variables though. – Martin Smith Jan 01 '12 at 00:25
  • @martin, what do you mean? could you provide an example with the `OVER` clause? – clarkk Jan 01 '12 at 00:35
  • @clarkk - MySQL **doesn't** support the `OVER` clause but syntax if it did would involve `SUM(count) OVER(ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)` – Martin Smith Jan 01 '12 at 00:39
  • This is probably possible in MySQL but window functions (as Martin Smith suggests) are the usual approach to this sort of thing. – mu is too short Jan 01 '12 at 00:43
  • 1
    @MartinSmith Yes, definitely, it's a real shame MySql doesn't support the over clause as it makes this trivial. Otherwise you need a running total or (urghh) a cursor, I think. All the best for 0b11111011100 by the way! – dash Jan 01 '12 at 00:44
  • @martin, ok.. but if you should recommend a database with more possibilities than mysql, which would it be? – clarkk Jan 01 '12 at 00:48
  • @clarkk Look at Martin's tag history - it's going to be SQL Server :-). That's my choice too, though, but mainly because that's what I know and I develop for the .Net platform. It is a great RDBMS. – dash Jan 01 '12 at 00:49
  • @dash - Though SQL Server doesn't get the `ROWS/RANGE` functionality needed for running totals until Denali comes out. Postgresql is free / open source and seems to have a lot more functionality than MySQL as far as I'm aware. – Martin Smith Jan 01 '12 at 10:52
  • @MartinSmith I've heard lots of good things about Postgres but I also remember there were quite a few bugs with the driver when doing some .Net -> Postgres stuff a few years back. Sure it's better now. As for MySql vs Postgres - as with all things it seems to depend - http://olex.openlogic.com/wazi/2011/postgresql-vs-mysql-which-is-the-best-open-source-database/. I think I'll get to know Postgres better in the new year, so thanks for the comment :-) – dash Jan 01 '12 at 12:11

1 Answers1

6

The following query:

SELECT * FROM 
(SELECT id, 
       count, 
       @running_count := @running_count + count AS Counter 
  FROM sumtest, (SELECT @running_count := 0) AS T1 ORDER BY id) AS TableCount 

WHERE TableCount.Counter < 50;

produces the results:

id  count   Counter
1   30      30
2   10      40
3   5       45

I copied your table into MySql and called it "sumtest" btw. Please replace with your table name.

Effectively, we work out the running total, in id order, then use that as a subquery.

So this query:

SELECT id, 
       count, 
       @running_count := @running_count + count AS Counter 
FROM sumtest, (SELECT @running_count := 0) AS T1 
ORDER BY id

Produces:

id  count   Counter
1   30      30
2   10      40
3   5       45
4   20      65
5   15      80

So then it becomes a trivial matter to select all those rows where the counter is less than your desired sum by performing another select on this.

EDIT: Here is an example with a cursor. I've just thrown this function together for you (note my table is called sumtest and my account is the default root@localhost):

DELIMITER $$

DROP FUNCTION IF EXISTS `Test_Cursing` $$
CREATE DEFINER=`root`@`localhost` FUNCTION `Test_Cursing`(_running_total_limit INT) RETURNS int
BEGIN
  /* Why am I on StackOverflow at 01:41 on New Years Day. Dear oh dear, where's the beer? */
  DECLARE _running_count INT default 0;
  DECLARE _id INT;
  DECLARE _current_id INT;
  DECLARE _sum_count INT;

  DECLARE _cur CURSOR FOR SELECT id, count FROM sumtest ORDER BY id;

  OPEN _cur;

  read_loop: LOOP
    FETCH _cur INTO _id, _sum_count;

    SET _running_count = _running_count + _sum_count;

    IF _running_count > _running_total_limit   THEN
      LEAVE read_loop;
    END IF;

    SET _current_id = _id;

  END LOOP;

  CLOSE _cur;

    RETURN _current_id;

END $$

DELIMITER ;

Calling it this way:

SELECT Test_Cursing(50);

will return id = 3 - that is, the last id before the running total limit is breached. You can then use this to:

 SELECT * FROM sumtest WHERE id <= Test_Cursing(50);

Which returns:

id  count
1   30
2   10
3   5
dash
  • 89,546
  • 4
  • 51
  • 71
  • haven't tested it, but wouldn't it be ineffecient with all the selects? it will end up with 3 selects – clarkk Jan 01 '12 at 00:55
  • Depends. If the table is very, very large then you might see loss of performance from the running total, but you'd need to test (and use the EXPLAIN keyword for example). The select @running_count:=0 and the outer select are trivial and should not have any impact in comparison. On the other hand, you could use a cursor as you can exit as soon as you reach your desired total. It's also trivial and efficient to do in code as well with a SqlDataReader (if you know C#?) – dash Jan 01 '12 at 00:57
  • ok, thanks :) but how could you integrate a cursor in the query so it will exit as soon as the value is reached? – clarkk Jan 01 '12 at 01:17
  • and is it possible to do "the check" after the rows are picked so you also will get the first row after the value is reached? :) – clarkk Jan 01 '12 at 01:25
  • @clarkk I've updated with a cursor. This will return the id of the row just before your running total target is breached. This should be more efficient. Getting me to write a cursor at 01:30AM though... Grrrrrr :-) Hope this helps, and all the best for 2012. – dash Jan 01 '12 at 01:46
  • thank you very much..! you deserve a beer :) will look at you example – clarkk Jan 01 '12 at 11:30
  • but how often do I need to create/declare the function? each time a new connection (mysql_connect) is made? where is the function stored? I can't redeclare the function but if I only run `SELECT * FROM sumtest WHERE id <= Test_Cursing(50);` rows are returned.. Have updated my question with the query I'm using :) – clarkk Jan 01 '12 at 12:18
  • The function only needs to be created once - you create the function in the database and then call it from your PHP client code - SELECT * FROM my_table WHERE id <= Test_Cursing(50);. I created mine using MySql Administration tools. Try removing the DEFINER=`root`@`localhost` section and also make sure you have the correct table name (not sumtest) in the DECLARE _cur CURSOR FOR SELECT id, count FROM sumtest ORDER BY id; Finally, what version of MySql are you using? I tested using 5.5.16. – dash Jan 01 '12 at 13:24
  • I got it now! :) because I extended the `CREATE FUNCTION` query with `SELECT * FROM sumtest WHERE id <= Test_Cursing(50);` But I have one last question.. I can't really figure out how and where `_id` and `_sum_count` are set because the first appearance of them is `FETCH _cur INTO _id, _sum_count;` and the next is `SET _running_count = _running_count + _sum_count; SET _current_id = _id;` – clarkk Jan 01 '12 at 13:29
  • It's how cursors work - you need to create variables for each of the columns in your table you want to look at the value for in each loop of the cursor. So what the FETCH _cur INTO _id, _sum_count; statement really says is: Loop 1 SELECT 1 into _id, 30 into _sum_count. Loop 2: SELECT 2 into _id, 10 into _sum_count. Loop 3: SELECT 3 into _id, 5 into _sum_count. There will be a 4th loop, but then the exit condition will fire and we leave the loop. So what _id and _sum_count are really for are placeholders for the values on each row - does that help? – dash Jan 01 '12 at 13:39
  • now I get it! hehe.. thanks alot... I owe a couple of beers :D – clarkk Jan 01 '12 at 13:50
  • if `DECLARE _cur CURSOR FOR SELECT id, count FROM stock WHERE type=2 && product_id=_product_id ORDER BY time DESC, id DESC;` returns 0 rows an error is returned.. `No data - zero rows fetched, selected, or processed` how can I avoid this? My question is updated – clarkk Jan 01 '12 at 16:52
  • Your query should be WHERE type =2 AND product_id = _product_id in your cursor select line. You should also check that a row or rows with that product id exist. You can quickly check your cursor query by running the select by itself against your table just to make sure it works. How are you calling it? – dash Jan 01 '12 at 17:37
  • Or do you mean there are no rows for that product Id? – dash Jan 01 '12 at 17:45
  • I have updated my question.. Apparently the procedure without the `AND` was an older version :) I got it right in the current.. The query is returned from a method in a class, so it would be most efficient if the check could be made in the procedure :) Then I don't have to do alot of rewritting – clarkk Jan 01 '12 at 17:49
  • there are no rows with that product id – clarkk Jan 01 '12 at 17:49
  • If there are no products with that id what do you want to happen? Do you want to return 0 in that case? – dash Jan 01 '12 at 17:54
  • Try this just before the DECLARE CURSOR line: IF (SELECT COUNT(*) FROM stock WHERE group_id=_group_id && type=2 && product_id=_product_id) = 0 THEN RETURN 0 END IF; i.e. test to see if any rows exist, and if they don't, return immediately. – dash Jan 01 '12 at 18:06
  • oh, think I got it.. the IF statement has to be put after all DECLARE's :) – clarkk Jan 01 '12 at 18:24