-1

I have a query:

select y, m, count(distinct id) as y_m_cnt, *other columns* from tab group by y, m, *other columns*
),
t2 as (
select y, count(distinct id) as y_cnt, *other columns* from tab group by y, *other columns*
)
select * t1.y, t1.m, t1.y_m_cnt, t2.y_cnt, t1.*other columns* from t1 left join t2 on t1.y=t2.y, t1.*other columns* = t2.*other columns*

Where y is a year and m is a month. The idea is that i want to count unique ids in two different aggregations, which different only in period (one is year + month, another is year only). The code above is working and almost ok, but i dislike it. Is it possible to re-write this code and make it shorter?

P.S. This code is simplified. In reality it has much more columns, so i need to select same columns twice and then i have terribly big join condition (query is more than 100 lines). That's why I'm looking for simplifying it. Found some solutions with partition but they don't cover problem of unique ids.

Edit: Hoped it would work without full query, but seems not. The problem is I need year_count only because of aggregating without mm, dd. Is it possible to make this query shorter?

with full_count as (
    select param1,
           yy,
           mm,
           dd,
           param2,
           param3,
           param4,
           count(distinct id1)  as cnt_id1,
           count(distinct id2)  as cnt_id2,
           count(distinct id3)  as cnt_id3,
           cast(null as string) as param5,
           cast(null as string) as param6,
           cast(null as string) as param7,
           cast(null as string) as param8,
           param9,
           param10,
           param11,
           param12,
           param13,
           param14,
           param15,
           param16,
           param17,
           param18,
           param19,
           param20,
           param21,
           param22

    from table
    group by param1,
             yy,
             mm,
             dd,
             param2,
             param3,
             param4,
             param9,
             param10,
             param11,
             param12,
             param13,
             param14,
             param15,
             param16,
             param17,
             param18,
             param19,
             param20,
             param21,
             param22
),
     year_count as (
         select param1,
                yy,
                param2,
                param3,
                param4,
                count(distinct id1)  as cnt_id1_yy,
                count(distinct id2)  as cnt_id2_yy,
                count(distinct id3)  as cnt_id3_yy,
                cast(null as string) as param5,
                cast(null as string) as param6,
                cast(null as string) as param7,
                cast(null as string) as param8,
                param9,
                param10,
                param11,
                param12,
                param13,
                param14,
                param15,
                param16,
                param17,
                param18,
                param19,
                param20,
                param21,
                param22

         from table
         group by param1,
                  yy,
                  param2,
                  param3,
                  param4,
                  param9,
                  param10,
                  param11,
                  param12,
                  param13,
                  param14,
                  param15,
                  param16,
                  param17,
                  param18,
                  param19,
                  param20,
                  param21,
                  param22
     )

select fc.param1     as param1,
       fc.yy         as yy,
       fc.mm         as mm,
       fc.dd         as dd,
       fc.param2     as param2,
       fc.param3     as param3,
       fc.param4     as param4,
       fc.cnt_id1    as cnt_id1,
       fc.cnt_id2    as cnt_id2,
       fc.cnt_id3    as cnt_id3,
       yc.cnt_id1_yy as cnt_id1_yy,
       yc.cnt_id2_yy as cnt_id2_yy,
       yc.cnt_id3_yy as cnt_id3_yy,
       fc.param5     as param5,
       fc.param6     as param6,
       fc.param7     as param7,
       fc.param8     as param8,
       fc.param9     as param9,
       fc.param10    as param10,
       fc.param11    as param11,
       fc.param12    as param12,
       fc.param13    as param13,
       fc.param14    as param14,
       fc.param15    as param15,
       fc.param16    as param16,
       fc.param17    as param17,
       fc.param18    as param18,
       fc.param19    as param19,
       fc.param20    as param20,
       fc.param21    as param21,
       fc.param22    as param22
from full_count fc
         left join year_count yc on
            coalesce(fc.param1, 'tmp') = coalesce(yc.param1, 'tmp') and
            coalesce(fc.yy, 'tmp') = coalesce(yc.yy, 'tmp') and
            coalesce(fc.param2, 'tmp') = coalesce(yc.param2, 'tmp') and
            coalesce(fc.param3, 'tmp') = coalesce(yc.param3, 'tmp') and
            coalesce(fc.param4, 'tmp') = coalesce(yc.param4, 'tmp') and
            coalesce(fc.param6, 'tmp') = coalesce(yc.param6, 'tmp') and
            coalesce(fc.param9, 'tmp') = coalesce(yc.param9, 'tmp') and
            coalesce(fc.param10, 'tmp') = coalesce(yc.param10, 'tmp') and
            coalesce(fc.param11, 'tmp') = coalesce(yc.param11, 'tmp') and
            coalesce(fc.param12, 'tmp') = coalesce(yc.param12, 'tmp') and
            coalesce(fc.param13, 'tmp') = coalesce(yc.param13, 'tmp') and
            coalesce(fc.param14, 'tmp') = coalesce(yc.param14, 'tmp') and
            coalesce(fc.param15, 'tmp') = coalesce(yc.param15, 'tmp') and
            coalesce(fc.param16, 'tmp') = coalesce(yc.param16, 'tmp') and
            coalesce(fc.param17, 'tmp') = coalesce(yc.param17, 'tmp') and
            coalesce(fc.param18, 'tmp') = coalesce(yc.param18, 'tmp') and
            coalesce(fc.param19, 'tmp') = coalesce(yc.param19, 'tmp') and
            coalesce(fc.param20, 'tmp') = coalesce(yc.param20, 'tmp') and
            coalesce(fc.param21, 'tmp') = coalesce(yc.param21, 'tmp') and
            coalesce(fc.param22, 'tmp') = coalesce(yc.param22, 'tmp')
Andrey
  • 19
  • 7

1 Answers1

0

You don't specify the database you are using, but many support count(distinct) as a window function:

select t.*,
       count(distinct id) over (partition by y) as y_unique,
       count(distinct id) over (partition by y, m) as ym_unique
from tab t;

There are work-arounds in databases that don't support distinct in window functions.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • SQL Server, Postgresql, MySql, MariaDB and DB2 of the major databases do not support DISTINCT with COUNT() window function. Which are the *many* databases that you refer to? Is it only Oracle? – forpas Sep 14 '21 at 17:28
  • I think this query will give a line for each record in table, but I need aggregated lines. – Andrey Sep 14 '21 at 17:50
  • (1) That is not what your question suggests (with all the fuss about other columns). (2) The results you want are totally unclear. – Gordon Linoff Sep 14 '21 at 19:47