2

I need to do an advanced selection in SQL, but I'm stuck.

I have the following table:

id | user_id | position | value
1  |   1     |    1     |   1
1  |   1     |    2     |   1
1  |   1     |    3     |   3
1  |   2     |    1     |   2
1  |   2     |    2     |   2
1  |   2     |    3     |   2
1  |   3     |    1     |   3
1  |   3     |    2     |   2
1  |   3     |    3     |   1

I need a query that gives me a result set ordered as this:

  1. Total sum for each user (user 1: 5, user 2: 6, user 3: 6)
  2. Value for position 3 for each user (user 1: 3, user 2: 2, user 3: 1)
  3. Val for pos 3 + val for pos 2 for each user (user 1: 4, user 2: 4, user 3: 4)
  4. Val for pos 3 + val for pos 2 + val for pos 1 for each user (user 1: 5, user 2: 6, user 3: 6)

This is just an example, the table can actually contain more positions, so I need a query that is not hard coded on three positions.

NOTE: There is always the same number of positions for each user_id. In this example it's three, but I could as well truncate the table and add data for each user using five positions.

An ugly solution is to assume that there are never no more than ten positions, creating pos1, pos2, and so on as columns and just add them accordingly in the query. If you only use three positions you get a lot of NULL values and you also get stuck with a maximum of ten positions.

I have considered the use of temporary tables, but haven't found a breakthrough there either.

How would you do it?

PM 77-1
  • 12,933
  • 21
  • 68
  • 111
peltors
  • 43
  • 4

2 Answers2

1

You can potentially do something like this:

select user_id
    , sum(value) as value_sum
    , (select value from my_table where user_id = t.user_id and position = 3) as pos_3_val
    , (select sum(value) from my_table where user_id = t.user_id and position >= 2) as pos_2_3_val
    , (select sum(value) from my_table where user_id = t.user_id and position >= 1) as pos_1_2_3_val
from my_table as t
group by user_id
order by user_id

I think this should work in most any RDBMS.

If it has to by dynamic, you could potentially create this query in stored procedure or your application and run it.

You could also dynamically pivot your results from a query like this:

select *
    , (
        select sum(value)
        from my_table
        where user_id = t.user_id
            and position >= t.position
    ) as running_total_descending
from my_table t

Please let us know if any of this works, and if you have trouble creating a dynamic version (and which RDBMS).

UPDATE

Now that we know the RDBMS (MySQL) we can have a specific dynamic version:

set @sql = null;
select
  group_concat(distinct
    concat(
      ' sum(case when position >= ',
      position,
      ' then value end) as pos_',
      position,
      '_plus'
    )
  ) into @sql
from my_table;

set @sql = concat('select user_id,', @sql, ' from my_table t group by user_id;');

prepare stmt from @sql;
execute stmt;
deallocate prepare stmt;

SQL Fiddle

Special thanks to @bluefeet for posting this type of solution often.

I should also note that many devs believe this type of pivoting often belongs in the application or front-end. I'm no exception, both for separation of concerns and because your app can generally scale better than your OLTP database.

Tim Lehner
  • 14,813
  • 4
  • 59
  • 76
  • Neither of your queries completely conforms to OP's requirements. First one is hard-coded and second - does bot produce single rows per each user. – PM 77-1 Apr 10 '13 at 19:33
  • @PM77-1 I believe I addressed both of your points in my original answer since we didn't yet know the RDBMS. At any rate, I've added a dynamic solution. – Tim Lehner Apr 10 '13 at 21:50
1

I need a query that is not hard coded on three positions.

Then you can't output the subtotals in columns. SQL requires that the columns are fixed at the time you prepare the query; you can't write a query that appends more columns dynamically as it discovers how many distinct values are in the data.

You can, however, output a dynamic number of rows.

SELECT t1.user_id, CONCAT(t1.position, '-', MAX(t2.position)) AS position_range, 
  SUM(t2.value) AS subtotal
FROM MyTable t1
INNER JOIN MyTable t2
  ON t1.user_id = t2.user_id AND t1.position <= t2.position
GROUP BY t1.user_id, t1.position;

The output is:

+---------+----------------+----------+
| user_id | position_range | subtotal |
+---------+----------------+----------+
|       1 | 1-3            |        5 |
|       1 | 2-3            |        4 |
|       1 | 3-3            |        3 |
|       2 | 1-3            |        6 |
|       2 | 2-3            |        4 |
|       2 | 3-3            |        2 |
|       3 | 1-3            |        6 |
|       3 | 2-3            |        3 |
|       3 | 3-3            |        1 |
+---------+----------------+----------+

You'll have to write application code to pivot this into columns after you fetch the whole result set.

Sorry, there is no way to write a fully dynamic pivot query in any brand of RDBMS. You have two choices:

  1. Write code to generate the SQL based on data, as shown in @TimLehner's updated answer

  2. Write code to post-process a general-purpose query like the one I show above.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Sweet solution, it leads me to the next step, the sort order described in the orig. post. I added "SUM(t1.value) AS grandtotal" to the SELECT section and "ORDER BY grandtotal DESC, subtotal DESC" to the end. This didn't do the trick for me. The desired result is a result set containing each user_id only once, ordered as described above. – peltors Apr 10 '13 at 21:19
  • 1
    You can't do that dynamically in SQL, because it would need a variable number of columns. You'll have to post-process the result set in application code. – Bill Karwin Apr 10 '13 at 21:31
  • Ok, it's worth a million to me hearing that from someone else. Otherwise I would be trying to do this until the end of days :) – peltors Apr 10 '13 at 21:41