2

I have a panel data set: that is, times, ids, and values. I would like to do a ranking based on value for each date. I can achieve the sort very simply by running:

select * from tbl order by date, value

The issue I have is once the table is sorted in this way, how do I retrieve the row number of each group (that is, for each date I would like there to be a column called ranking that goes from 1 to N).

Example:

Input:

Date, ID, Value
 d1, id1, 2
 d1, id2, 1
 d2, id1, 10
 d2, id2, 11

Output:

Date, ID, Value, Rank
 d1, id2, 1, 1
 d1, id1, 2, 2
 d2, id1, 10, 1
 d2, id2, 11, 2
Mahmoud Gamal
  • 78,257
  • 17
  • 139
  • 164
Alex
  • 19,533
  • 37
  • 126
  • 195
  • 1
    Please don't name columns after SQL reserved words like DATE, even for illustrative purposes. – pilcrow Dec 06 '11 at 05:37
  • 1
    no, date is not a reserved word (due to the fact of too many people misuse on it, mysql allow date to be non-reserved) – ajreal Dec 06 '11 at 05:47
  • 1
    @ajreal: Could @pilcrow possibly mean that `DATE` is a *standard* SQL reserved word? I mean, I am not sure if there is such a thing. – Andriy M Dec 06 '11 at 07:07
  • Even if it is not reserved, it's not good practice to use `date` or `time` as table of column names. Too much confusion with the `DATE()` function or the `DATE` datatype. – ypercubeᵀᴹ Dec 06 '11 at 07:20
  • 2
    http://dev.mysql.com/doc/refman/5.0/en/reserved-words.html -- `MySQL permits some keywords to be used as unquoted identifiers because many people previously used them. Examples are those in the following list: ...` – ajreal Dec 06 '11 at 07:20
  • PS: I don't mean to encourage use of date or something very reserved keyword alike, but is just not a problem in mysql at the moment ... don't get upset – ajreal Dec 06 '11 at 07:26
  • these are all good points. the column in my database is named "dte" for this purpose. your points are all well taken. – Alex Dec 07 '11 at 03:28

3 Answers3

3

Absent window functions, you can order tbl and use user variables to compute rank over your partitions ("date" values) yourself:

SELECT "date",                                                -- D) Desired columns
       id,
       value,
       rank
  FROM (SELECT "date",                                        -- C) Rank by date
               id,
               value,
               CASE COALESCE(@partition, "date")
                 WHEN "date" THEN @rank := @rank + 1
                 ELSE             @rank := 1
               END AS rank,
               @partition := "date" AS dummy
          FROM (SELECT @rank := 0 AS rank,                    -- A) User var init
                       @partition := NULL AS partition) dummy
               STRAIGHT_JOIN
               (  SELECT "date",                              -- B) Ordering query
                         id,
                         value
                    FROM tbl
                ORDER BY date, value) tbl_ordered;

Update

So, what is that query doing?

We are using user variables to "loop" through a sorted result set, incrementing or resetting a counter (@rank) depending upon which contiguous segment of the result set (tracked in @partition) we're in.

In query A we initialize two user variables. In query B we get the records of your table in the order we need: first by date and then by value. A and B together make a derived table, tbl_ordered, that looks something like this:

rank | partition | "date" |  id  | value 
---- + --------- + ------ + ---- + -----
  0  |   NULL    |   d1   |  id2 |    1
  0  |   NULL    |   d1   |  id1 |    2
  0  |   NULL    |   d2   |  id1 |   10
  0  |   NULL    |   d2   |  id2 |   11

Remember, we don't really care about the columns dummy.rank and dummy.partition — they're just accidents of how we initialize the variables @rank and @partition.

In query C we loop through the derived table's records. What we're doing is more-or-less what the following pseudocode does:

rank      = 0
partition = nil

foreach row in fetch_rows(sorted_query):
  (date, id, value) = row

  if partition is nil or partition == date:
    rank += 1
  else:
    rank = 1

  partition = date

  stdout.write(date, id, value, rank, partition)

Finally, query D projects all columns from C except for the column holding @partition (which we named dummy and do not need to display).

pilcrow
  • 56,591
  • 13
  • 94
  • 135
  • thank you for this. i am trying to follow this query but it is a bit complicated. could you please give a little bit of a commentary on what is going on here? – Alex Dec 07 '11 at 03:31
  • thank you for your help. very clear explanation.. perfect for someone without much SQL experience as myself! – Alex Dec 07 '11 at 04:22
  • brilliant approach! rename the partition attr from the dummy table because MySQL will get confused. - "@partition := NULL AS partition" – ninjabber Jan 09 '15 at 16:07
  • Thanks, @ninjabber. This kind of manual partitioning gets unwieldy quickly — you sorely miss windowing/analytic function when you want to partition on three or more criteria — but it works well enough for this problem. – pilcrow Jan 09 '15 at 19:30
1

I know this is an old question but here is a shorter answer:

SELECT w.*, if(
      @preDate = w.date,
      @rank := @rank + 1,
      @rank := (@preDate :=w.date) = w.date
    ) rank
FROM tbl w
JOIN (SELECT @preDate := '' )a
ORDER BY date, value
palindrom
  • 18,033
  • 1
  • 21
  • 37
0

Would this do the trick?

select [DATE],ID,Value, 
(DENSE_RANK()  OVER (   
   PARTITION BY ID
 ORDER BY Date) )AS [DenseRank],    
ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY [Date] DESC) AS RN     
from SomeTable 
ajreal
  • 46,720
  • 11
  • 89
  • 119
Tim
  • 41
  • 2
  • You might reserve an answer like this for a question about SQL Server. (Although I must say that you should also pay more attention to the question you are answering: partitioning was supposed to be done by `Date`, not by `ID`, and ordering was more likely by `Value` than by anything else.) – Andriy M Dec 06 '11 at 07:31