12

I don't understand why there are different results when using an ORDER BY clause in an analytic COUNT function.

Using a simple example:

with req as
 (select 1 as n, 'A' as cls
    from dual
  union
  select 2 as n, 'A' as cls
    from dual)
select req.*, count(*) over(partition by cls) as cnt from req;

gives the the following result:

N   CLS CNT
2   A   2
1   A   2

Whereas, when adding an ORDER BY in the analytic clause, the result is different!

with req as
 (select 1 as n, 'A' as cls
    from dual
  union
  select 2 as n, 'A' as cls
    from dual)
select req.*, count(*) over(partition by cls order by n) as cnt from req;

CNT column changed:

N   CLS CNT
1   A   1
2   A   2

Can someone explain please?

Thanks

Carmellose
  • 4,815
  • 10
  • 38
  • 56

3 Answers3

8

First, a link to docs. It's somewhat obscure, however.

Analytic clause consists of query_partition_clause, order_by_clause and windowing_clause. And, a really important thing about windowing_clause is

You cannot specify this clause unless you have specified the order_by_clause. Some window boundaries defined by the RANGE clause let you specify only one expression in the order_by_clause. Refer to "Restrictions on the ORDER BY Clause".

But not only can you not use windowing_clause without the order_by_clause, they are tied together.

If you omit the windowing_clause entirely, then the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

The default windowing clause produces something like running total. COUNT returns 1 for first row, as there is only one row between the top of the window and the current row, 2 for the second row and so on.

So in your first query there is no windowing at all, but there is the default windowing in the second one.

And you can simulate the behavior of the first query by specifying fully unbounded window.

with req as
 (select 1 as n, 'A' as cls
    from dual
  union
  select 2 as n, 'A' as cls
    from dual)
select req.*, count(*) over(partition by cls order by n RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as cnt from req;

Yep

N   CLS CNT
1   A   2
2   A   2
Paul
  • 1,085
  • 12
  • 20
  • 2
    The default windowing clause produces a "running count" only in the special case when the ORDER BY clause produces a total ordering (with no ties). The default window clause uses `RANGE between...` rather than `ROWS between...` - so rows where the `ORDER BY` measure is the same will have the same `COUNT` as well. To get a true "running count" in these cases, one needs to add an explicit windowing clause, `ROWS between unbounded preceding and current row`. Other than that, nice answer! –  Dec 28 '16 at 16:32
  • That's a very good call, thanks. I thought about it and wrote "something like running total" as I didn't want to get into this `RANGE/ROWS` thing. Especially since it's covered well in the docs. I guess I should have though, but you've provided a very nice and clear explanation in your answer, so I'm gonna leave it at that. – Paul Dec 28 '16 at 17:56
6

The easiest way to think about this - leaving the ORDER BY out is equivalent to "ordering" in a way that all rows in the partition are "equal" to each other. Indeed, you can get the same effect by explicitly adding the ORDER BY clause like this: ORDER BY 0 (or "order by" any constant expression), or even, more emphatically, ORDER BY NULL.

Why you get the COUNT() or SUM() etc. for the entire partition has to do with the default windowing clause: RANGE between unbounded preceding and current row. "Range" (as opposed to "ROWS") means all rows "tied" with the current row are also included, even if they don't precede it. Since all rows are tied, this means the entire partition is included, no matter which row is "current."

  • Nicely explained!! – TapanHP Apr 29 '19 at 11:44
  • For me, once using order by, it sounds like using nested "partition by". Is this correct in any way? – DINA TAKLIT Oct 03 '19 at 21:50
  • @DINATAKLIT - no. –  Oct 03 '19 at 22:50
  • Why not? Coz once I ran an example with partition by the account_id order by the occurred month it sums up regarding the month once I omit it it sum up only regarding the account_id so that's why I said it's partition inside partition once we use order by another column. Does not this, in this case, make a scene, Does it? – DINA TAKLIT Oct 03 '19 at 23:52
1

Window functions will perform the aggregation over partition by (split by) value, when you omit ORDER BY clause the result will be similar to GROUP BY with output of each row . It is also possible to omit PARTITION BY, in which case there is just one partition containing all the rows

When you add ORDER BY clause to window function then it will perform the calculation in subsequent order within the same partition and start over with a different partition (group of values)

Values that are not distinct in the ORDER BY ordering are said to be peers, in COUNT() they will have the same calculated result of its last peer that will create gaps that maintain the total

Khaled Eid
  • 54
  • 5