1

I have below table and i want to count distinct values that cross between two columns.

ID_DATE     DESCRIPT1   DESCRIPT2
20191001    A           R
20191001    D           B
20191001    B           D
20191001    A           B
20191002    A           B
20191002    C           A
20191002    A           B

Below is my query but the result is not accurate

SELECT  
COUNT(distinct DESCRIPT1 || ' - ' ||  DESCRIPT2) AS ALL_DESCRIPT,
COUNT(DISTINCT DESCRIPT1) AS DESCRIPT_A, 
COUNT(DISTINCT DESCRIPT2) AS DESCRIPT_B, 
ID_DATE FROM MY_TABLE  GROUP BY ID_DATE;

My result,

enter image description here

ALL_DESCRIPT    DESCRIPT_A  DESCRIPT_B  ID_DATE
4               3           3           20191001
2               2           2           20191002

In my result, the column ALL_DESCRIPT with ID_DATE 20191002 gives me total of 2 instead of 3. It should be 3 because i have A,B and C which is a total of 3 in both column DESCRIPT1 and DESCRIPT2 Where am i doing it wrong.

Below is insert query for testing in oracle just in case one requires.

   INSERT all 
   INTO SRC_DATA (ID_DATE, DESCRIPT1, DESCRIPT2) VALUES ('20191001','A','R')
   INTO SRC_DATA (ID_DATE, DESCRIPT1, DESCRIPT2) VALUES ('20191001','D','B')
   INTO SRC_DATA (ID_DATE, DESCRIPT1, DESCRIPT2) VALUES ('20191001','B','D')
   INTO SRC_DATA (ID_DATE, DESCRIPT1, DESCRIPT2) VALUES ('20191001','A','B')
   INTO SRC_DATA (ID_DATE, DESCRIPT1, DESCRIPT2) VALUES ('20191002','A','B')
   INTO SRC_DATA (ID_DATE, DESCRIPT1, DESCRIPT2) VALUES ('20191002','C','A')
   INTO SRC_DATA (ID_DATE, DESCRIPT1, DESCRIPT2) VALUES ('20191002','A','B')
   SELECT * FROM dual;
Omari Victor Omosa
  • 2,814
  • 2
  • 24
  • 46
  • May be easier to make it one column with a UNION: `select date, descript1 from table union all select date descript2 from table` – Caius Jard Dec 06 '19 at 07:29
  • 1
    With `COUNT(distinct DESCRIPT1 || ' - ' || DESCRIPT2)` you count distinct pairs, of which you have only two on 20191002: A-B and C-A. – Thorsten Kettner Dec 06 '19 at 07:57

4 Answers4

2

To add to littlefoot's query and give the other columns, this is a pivot operation, it seems:

select 
  id_date, 
  count(distinct descript) all_descript,
  count(case when descript = 'A' then 1 end) as descript_a,
  count(case when descript = 'B' then 1 end) as descript_B
from 
(
  select id_date, descript1 descript
  from src_data
  union all
  select id_date, descript2 descript 
  from src_data
) x
group by id_date
order by id_date;

You can add more columns for different letters by following the pattern of putting another latter in the string and naming the column differently. It works by having a case when return a non null value when the data is e.g. A, and a null when the data is not A. Count counts non null data only. It might make more sense to you to use SUM(CASE WHEN descript = 'A' THEN 1 ELSE 0 END) - it's the same effect


Edit: actually I think I misunderstood the request. Try this instead:

    select 
      id_date, 
      count(distinct descript) all_descript,
      count(distinct descript1) as descript_a,
      count(distinct descript2) as descript_B
    from 
    (
      select id_date, descript1 descript, descript1, descript2
      from src_data
      union all
      select id_date, descript2 descript, null, null
      from src_data
    ) x
    group by id_date
    order by id_date
Caius Jard
  • 72,509
  • 5
  • 49
  • 80
1

I can't see images, but - from what you described, looks like

SQL> select id_date, count(distinct descript) cnt
  2  from (select id_date, descript1 descript from src_data
  3        union all
  4        select id_date, descript2 descript from src_data
  5       )
  6  group by id_date
  7  order by id_date;

ID_DATE         CNT
-------- ----------
20191001          4
20191002          3

SQL>

If you add a column which shows the source (what in my example), then you'd

SQL> select id_date,
  2    count(distinct descript) cnt,
  3    count(distinct case when what = 'A' then descript end) descript_a,
  4    count(distinct case when what = 'B' then descript end) descript_b
  5  from (select 'A' what, id_date, descript1 descript from src_data
  6        union all
  7        select 'B' what, id_date, descript2 descript from src_data
  8       )
  9  group by id_date
 10  order by id_date;

ID_DATE         CNT DESCRIPT_A DESCRIPT_B
-------- ---------- ---------- ----------
20191001          4          3          3
20191002          3          2          2

SQL>
Littlefoot
  • 131,892
  • 15
  • 35
  • 57
  • Works well, I have added text result. How can i get the results now including also for DESCRIPT_A and DESCRIPT_B? – Omari Victor Omosa Dec 06 '19 at 07:38
  • Question, Why do you `case what = 'A'` or B, since my actual table is dynamic? – Omari Victor Omosa Dec 06 '19 at 08:11
  • I know nothing about *dynamic*. A and B in my example simply show which table (actually, a column in your table) certain row belongs to. A = DESCRIPT1, B = DESCRIPT2. – Littlefoot Dec 06 '19 at 08:13
  • @Omari Victor Omosa: This is because your result column names are confusing. `DESCRIPT_A` and `DESCRIPT_B` suggest that you are counting 'A' and 'B' values, while it's actually just distinct `DESCRIPT1` and `DESCRIPT2` values you want to count. Littlefoot supposed you wanted to look for 'A' and 'B' values and count them. – Thorsten Kettner Dec 06 '19 at 08:16
  • @ThorstenKettner You got me right, I actually just wanted distinct DESCRIPT1 and DESCRIPT2 – Omari Victor Omosa Dec 06 '19 at 08:31
1

When having trouble with aggregations, you can always write separate aggregate queries and join them then. In your case this could be:

select t1.all_descript, t2.descript_a, t2.descript_b, id_date
from -- this subquery gets you the overall distinct count
(
  select id_date, count(*) as all_descript
  from 
  (
    select id_date, descript1 from mytable
    union
    select id_date, descript2 from my_table
  )
  group by id_date
) t1
join -- this subquery gets you the separate distinct counts
(
  select
    id_date,
    count(distinct descript1) as descript_a,
    count(distinct descript2) as descript_b
  from my_table
  group by id_date
) t2 using (id_date)
order by id_date;
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
1

This should solve your query, I simply used a memory table to store different columns in memory then called them and counted in distinct form.

WITH b AS  (
              SELECT id_date,DESCRIPT1  col1,descript1,descript2 FROM 
              SRC_DATA
              UNION 
            SELECT id_date,DESCRIPT2  col1,descript1,descript2 FROM SRC_DATA
           )
SELECT id_date,count(DISTINCT col1) col1,count(DISTINCT descript1) 
    descript1,count(DISTINCT descript2) descript2 
FROM b
GROUP BY id_date
gwatene
  • 71
  • 2
  • This seems to be a simple answer. How will memory table perform if i have over 10Million records, Will it be able to hold? – Omari Victor Omosa Dec 06 '19 at 08:19
  • @gwatene: This is almost Caius Jard's second query. I like theirs better for 1) using `UNION ALL` instead of `UNION` and 2) using nulls to ease distinct counting. – Thorsten Kettner Dec 06 '19 at 08:20
  • 1
    @Omari Victor Omosa: You need a `UNION` for the `ALL_DESCRIPT` result anyway. And you need `DISTINCT`, too. That's expensive, but that's the way it is. Why don't you look at the explain plans for the different queries to see how the optimizer considers the costs? – Thorsten Kettner Dec 06 '19 at 08:23
  • `UNION` removes duplicates, in this case @Omari needed distinct for col1, `UNION ALL` returns everything from the results – gwatene Dec 06 '19 at 08:26
  • @gwatene: No, you are using `COUNT(DISTINCT)` anyway, so `UNION` instead of `UNION ALL` has no effect, except for more more work for the DBMS. Moreover `UNION` works on whole rows of course, so you are removing some duplicates and keeping others. – Thorsten Kettner Dec 06 '19 at 08:30
  • @Omari Victor Omosa: What gwatene is calling "memory table" here is just an ad-hoc view. It makes no difference whether you put it in a separate `WITH` clause or right into the `FROM` clause as in the other answers. – Thorsten Kettner Dec 06 '19 at 08:37