0

There's no nice way to use COUNT(DISTINCT [...]) as your aggregation function for a T-SQL pivot. Is there any theoretical explanation for why T-SQL does not or can not provide the COUNT(DISTINCT [...]) option for pivot tables? Intuitively, it should not be any different from any other aggregation function. I've looked for any relevant documentation, but I can't even find where the invalidity of COUNT(DISTINCT [...]) is mentioned.

J. Mini
  • 1,868
  • 1
  • 9
  • 38
  • 1
    Because `PIVOT` is a restrictive and clunky operator. Just use conditional aggregation, which *does* allow `COUNT(DISTINCT...)` – Thom A Feb 21 '22 at 14:41
  • @Larnu Although I would recommend doing `DISTINCT` or `GROUP BY` in a subquery first, otherwise the compiler will not recognize the `CASE` as being unique – Charlieface Feb 21 '22 at 14:47
  • There are other limitations to `PIVOT`: for example you cannot do `COUNT(*)` or `COUNT(1)` even though there is no logical reason why not. You also can't do multiple columns – Charlieface Feb 21 '22 at 14:48
  • Most aggregates require a low amount of overhead between rows (e.g. `SUM` or `MIN`, etc only need to store one value). `COUNT(DISTINCT` has to store *all previously seen values*. – Damien_The_Unbeliever Feb 21 '22 at 14:49
  • 1
    "I've looked for any relevant documentation, but I can't even find where the invalidity of COUNT(DISTINCT [...]) is mentioned." - you expect documentation to contain *everything that is not possible*? That would make most documentation extremely unwieldy. Most documentation instead documents what *is possible*, which tends to at least be a finite list. – Damien_The_Unbeliever Feb 21 '22 at 14:50
  • @Charlieface , can you give an example of such behaviour? I've never seen it and I just did a quick test and couldn't repro: [db<>fiddle](https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=a51585e7a2d5d75e2bbb7e9124e889c8). – Thom A Feb 21 '22 at 14:53
  • 1
    "Does not" is easy; they just didn't build it. "Can not" almost never applies; few things are literally impossible, especially if you can easily determine the semantics. My guess would be that, since `COUNT(*)` is not allowed, they just disabled the use of `COUNT` altogether, even though `COUNT(DISTINCT column)` meets the [`PIVOT` requirement of being a "null-agnostic" aggregate](https://learn.microsoft.com/sql/t-sql/queries/from-transact-sql#pivot-clause) (note that `COUNT(*)` being disallowed is specifically called out). – Jeroen Mostert Feb 21 '22 at 14:53
  • @Larnu Check out these two execution plans https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=a5fcd602b1b0015e914c89f8ab5e9e29 Your version has two sorts and three scans, mine has a single scan and sort – Charlieface Feb 21 '22 at 15:09
  • I read your comment as meaning that you would get incorrect results, not the query plan would not be efficient, @Charlieface . – Thom A Feb 21 '22 at 15:10
  • @Larnu Sorry yes that is what I meant. The compiler does not see what the `CASE` is doing and assumes the values need to be re-aggregated per column. When this gets up to a lot of columns then the performance difference will be startling – Charlieface Feb 21 '22 at 15:12
  • *"I can't even find where the invalidity of `COUNT(DISTINCT [...])` is mentioned"* The grammar is quite clear [in the docs](https://learn.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql?view=sql-server-ver15) `()` you cannot use `DISTINCT`. Either way, this whole question is opinion based: nothing stop Microsoft doing it, they just decided not to. – Charlieface Feb 21 '22 at 15:12
  • @Charlieface Asking why something is theoretically impossible isn't opinion based. If it's not impossible, then the answer only need to be that it isn't. – J. Mini Feb 21 '22 at 20:45
  • @JeroenMostert I feel like impossibility is a valid thing to discuss in SQL. SQL is clearly built off a theoretical framework (i.e. relational algebra), so it's often possible to explain why SQL can't do something by using that framework. Of course, SQL's probably Turning Complete, but everyone knows that's not the point. Also, ctrl+f isn't finding me your `()` bit. What section is it under? – J. Mini Feb 21 '22 at 20:55
  • Please see https://meta.stackoverflow.com/questions/293815/is-it-subjective-to-ask-about-why-something-wasnt-implemented-in-the-language among others for why this type of question isn't normally a good fit. I'm not against these types of discussions, there is often a lot to be learned as shown in the comments above, but they are not a good fit for [so]. Impossibility is normally subjective as there are very few things that are actually impossible – Charlieface Feb 21 '22 at 20:56
  • Well yeah, Turing completeness (or lack thereof) is indeed not the point (and T-SQL in particular is definitely Turing complete), but neither is relational algebra, typically. Database implementers tend be relatively unconcerned with the theoretical underpinnings, it just needs to run queries and produce results. :P In this case whether or not `PIVOT` supports `COUNT(DISTINCT ...)` definitely does not seem like anything constricted by pure theory, and even if it was, since we can obviously "imagine" calculating it, it is implementable. – Jeroen Mostert Feb 21 '22 at 20:59
  • In other words, to cut a very long story short, the answer to your question is almost certainly "nothing", and if you remove the "theoretically speaking", it's "time, money and motivation". It may well be that the way `PIVOT` is implemented *internally* creates a real cost-of-implementation barrier to implementing `COUNT(DISTINCT ...)`, but that would be pure speculation to anyone who's not an implementer at MS (or a very motivated reverser). – Jeroen Mostert Feb 21 '22 at 21:14
  • @JeroenMostert Judging by the fact that `PIVOT` always compiles down to `CASE` aggregates, and the fact that `DISTINCT` over such aggregates performs poorly as mentioned earlier, I'm guessing that making it perform well would take too much twisting of the compiler to turn into something similar to what I proposed, and given there is alternative syntax available, it seems too much effort. – Charlieface Feb 22 '22 at 00:38
  • @J.Mini Sorry realized I quoted from the `PIVOT` document here https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-ver15 but linked the `FROM` document which is slightly different: `aggregate_function ( value_column [ [ , ]...n ])` although not sure which function could possibly have multiple arguments – Charlieface Feb 22 '22 at 00:39
  • @Charlieface: no built-in aggregates do, but CLR aggregates can. As the docs specifically call out "user-defined aggregates", these are presumably allowed (but I'm too lazy to test). – Jeroen Mostert Feb 22 '22 at 11:02
  • @JeroenMostert Thanks, I was almost about to create a question re this. I didn't think of SQLCLR – Charlieface Feb 22 '22 at 11:18

0 Answers0