Row number from one column, but then re-order using another column

Question

I'm aggregating (summing) some data from a purchases table, aggregated by total amount per region.

Data looks something like the following:

| id | region | purchase_amount |
| 1  | A      | 30              |
| 2  | A      | 35              |
| 3  | B      | 41              |

The aggregated data then looks like this, ordered by total_purchases:

| region  | total_purchases |
|   B     |  1238           |
|   A     |  910            |
|   D     |  647            |
|   C     |  512            |

I'd like to get a ranking for each region, ordered by total_purchases. I can do this using row_number (using SQLAlchemy at the moment) and this results in a table looking like:

| rank | region  | total_purchases |
|  1   |   B     |  1238           |
|  2   |   A     |  910            |
|  3   |   D     |  647            |
|  4   |   C     |  512            |

However, there's one more column that I'd like to group by and that's:
I want region 'C' to always be the first row, but keep it's ranking.

This would ideally result in a table looking like:

| rank | region  | total_purchases |
|  4   |   C     |  512            |
|  1   |   B     |  1238           |
|  2   |   A     |  910            |
|  3   |   D     |  647            |

I can do one or the other, but I can't seem to combine these 2 features together. If I use a row_number() function, I get the proper ordering.

I can bring the region 'C' row always to the top using an ordering across 2 columns:

ORDER BY
     CASE WHEN region = 'C' THEN 1 ELSE 0 DESC,
     total_purchases DESC

However, I can't seem to combine these 2 requirements into the same query.

What database are you using? Please tag your question appropriately. — Gordon Linoff, Apr 02 '15 at 02:23
. . My answer should work in Postgres. However, if you want valid syntax, you should edit your question and show your query. — Gordon Linoff, Apr 02 '15 at 03:04

score 2 · Answer 1 · answered Apr 02 '15 at 02:31

2

USE CTE to achieve that, put your ROW_NUMBER in your main query

;WITH C AS(
    SELECT ROW_NUMBER() OVER (ORDER BY total_purchases DESC) AS Rn
           ,region
           ,total_purchases
    FROM your_table
)
SELECT *
FROM C
ORDER BY (CASE WHEN region = 'C' THEN 1 ELSE 0 END) DESC
         ,total_purchases DESC

answered Apr 02 '15 at 02:31

sqluser

5,502
7
36
50

I tried this on the original table, but I had to add in the aggregation: ;WITH C AS( SELECT ROW_NUMBER() OVER (ORDER BY total_purchases DESC) AS Rn ,region ,SUM(purchases) AS total_purchases FROM your_table GROUP BY region ) but it said that the total_purchases row did not exist yet. How would I get around this? – J3Y Apr 02 '15 at 02:54
Instead of `ORDER BY total_purchases` put `ORDER BY SUM(purchases)` – sqluser Apr 02 '15 at 03:03
Ok got it, that works. Now to integrate it into our existing SQLAlchemy code. – J3Y Apr 02 '15 at 03:06

score 1 · Answer 2 · answered Apr 02 '15 at 02:26

1

Does this work?

select row_number() over (order by total_purchases desc) as rank,
       region, total_purchases
from table t
order by (case when region = 'C' then 1 else 0 end) desc, total_purchases desc;

answered Apr 02 '15 at 02:26

Gordon Linoff

1,242,037
58
646
786

That works fine. Now to integrate it into our existing SQLAlchemy queries. – J3Y Apr 02 '15 at 03:07

score 0 · Answer 3 · edited May 23 '17 at 12:14

This is about Postgres, we have a proper boolean type and can sort by any boolean expression directly:

SELECT rank() OVER (ORDER BY sum(purchase_amount) DESC NULLS LAST) AS rank
     , region
     , sum(purchase_amount) AS total_purchases
FROM   purchases 
GROUP  BY region
ORDER  BY (region <> 'C'), 1, region;  -- region as tiebreaker

Explain

Window functions are executed after aggregate functions, so we don't need a subquery or CTE here.
- Best way to get result count before LIMIT was applied
NULLS LAST?
- PostgreSQL sort by datetime asc, null first?
The final 1 is referencing the ordinal position 1 in the SELECT list, so we don't have to repeat the expression.
ORDER BY (region <> 'C') ?
- Sorting null values after all others, except special
The window function rank() seems adequate. As opposed to row_number(), equal total_purchases rank the same. To break possible ties and get a stable result in such cases, add region (or whatever) as last item to ORDER BY.
If you use row_number() and only ORDER BY sum(purchase_amount), equal totals can switch places in two separate calls. You could add another item to the ORDER BY clause of row_number() for a similar result, but an equal rank is more appropriate for equal total_purchases I'd say.

Row number from one column, but then re-order using another column

3 Answers3

Explain