1

The similar questions I've found don't seem to specifically answer how to get the separate counts on distinct elements of ALL columns without typing out each column out in a query.

I want to do something like this, for all columns, but without typing each column out:

SELECT     
  COUNT(distinct COL1) AS COL1DISTINCT,     
  COUNT(distinct COL2) AS COL2DISTINCT    
FROM TABLE_NAME;

But we don't want to type out all the individual columns because the same data needs to be retrieved on several other tables across several different clients, all of which will have different column names each time.

I tried finding a way to access the column names to create subqueries, but our meta-data tables are empty and don't provide any column names.

JNevill
  • 46,980
  • 4
  • 38
  • 63
  • 2
    Then write a program to create the SQL for you. *(Seriously, that's what Dynamic SQL is, you could do an internet search on that expression; `DYNAMIC SQL ORACLE`... SQL itself won't do this, but you could do it in Excel, your application or PLSQL.)* – MatBailie Mar 05 '18 at 15:35
  • 1
    Your metadata tables are empty. Like `USER_TAB_COLUMNS` is empty? If there is no way to get a list of columns in a table from your database, then you are kind of stuck for a solution. – JNevill Mar 05 '18 at 15:36
  • 1
    Often times, though, I just stick the list of columns in Excel and concatenate each line like `="COUNT(distinct " & A1 & ") AS COL1DISTINCT,"` and copy that down for each column in Row A. Cheap dynamic sql without having to script. Obviously not a great choice if you need to do this often. But... you have to get your column names/list from somewhere. – JNevill Mar 05 '18 at 15:38
  • @JNevill - correct, the USER_TABL_COLUMNS table is coming back empty. It exists, we just get nothing back when calling this: select * from USER_TAB_COLUMNS; – Davis Bickford Mar 05 '18 at 15:42
  • @DavisBickford - then you don't own the tables - maybe you access them through synonyms? Can you see the columns of the tables you're interested in from the `all_tab_columns` view instead? – Alex Poole Mar 05 '18 at 15:43
  • Note: I tried calling the following first, but attempted getting everything from after this was empty: select column_name from user_tab_columns where table_name='TABLE_NAME'; – Davis Bickford Mar 05 '18 at 15:45
  • @JNevill I think you did it! I can see the all_tab_columns view! I can get the column names like this: select column_name from all_tab_columns where table_name='TABLE_NAME'; From there, how would I construct the full query for all the distinct values? – Davis Bickford Mar 05 '18 at 15:50
  • You can write dynamic sql (create a string that holds a sql statement and then execute it with `EXECUTE IMMEDIATE`) [explained in depth here](https://docs.oracle.com/cloud/latest/db112/LNPLS/dynamic.htm) or you can be cheap and do the excel thing like I do (or really any scripting language to generate the SQL string). It's a little too broad to be answered here. – JNevill Mar 05 '18 at 15:54
  • @DavisBickford - does it have to be in the format your sample query has, with one result set column per table column; or would one result set row per column work for you? – Alex Poole Mar 05 '18 at 15:56
  • @AlexPoole I think either way is fine. We just need to get the count of distinct values for each column. All the data is getting reformatted to an excel sheet anyway. – Davis Bickford Mar 05 '18 at 16:02

1 Answers1

3

If you're OK with one result-set row per column you could adapt this XML magic trick:

select owner, table_name, column_name,
  to_number(xmlquery('/ROWSET/ROW/C/text()'
    passing xmltype(dbms_xmlgen.getxml(
      'select count(distinct "' || column_name || '") as c '
      || 'from "' || owner || '"."' || table_name || '"'))
  returning content)) as c
from all_tab_columns
where owner = '<your table owner>'
and data_type in ('NUMBER', 'DATE', 'TIMESTAMP', 'CHAR', 'VARCHAR2', 'NCHAR', 'NVARCHAR2');

... listing all the data types you need to be able to count; really that is to exclude those that can't handle distinct like CLOB, but as you may have nested tables etc. as well it's probably going to be simpler to list those you do want and expect to be able to count.

The dbms_xmlgen() call converts the result of that select count(distinct ...) ... query, which is effectively constructed dynamically, into an XML structure, and you can then pull the count out from that with XMLQuery() (instead of the deprecated extractvalue() in the linked answer).

As a very quick demo:

create table t42 (id number, str varchar2(20));
insert into t42 values (1, 'Test');
insert into t42 values (2, 'Test');
insert into t42 values (3, 'Test 2');
insert into t42 values (3, null);

select owner, table_name, column_name,
  to_number(xmlquery('/ROWSET/ROW/C/text()'
    passing xmltype(dbms_xmlgen.getxml(
      'select count(distinct "' || column_name || '") as c '
      || 'from "' || owner || '"."' || table_name || '"'))
  returning content)) as c
from all_tab_columns
where owner = 'MY_SCHEMA'
and table_name = 'T42'
and data_type in ('NUMBER', 'DATE', 'TIMESTAMP', 'CHAR', 'VARCHAR2', 'NCHAR', 'NVARCHAR2');

OWNER           TABLE_NAME      COLUMN_NAME              C
--------------- --------------- --------------- ----------
MY_SCHEMA       T42             ID                       3
MY_SCHEMA       T42             STR                      2

Is there a way to also get a count of nulls in every column as well?

The count() function ignores nulls, so to count those you have to convert them, e.g. with

count(case when <your_column> is null then 1 end)

You an include that here either with a second XMLQuery clause:

select owner, table_name, column_name,
  to_number(xmlquery('/ROWSET/ROW/C/text()'
    passing xmltype(dbms_xmlgen.getxml(
      'select count(distinct "' || column_name || '") as c '
      || 'from "' || owner || '"."' || table_name || '"'))
  returning content)) as distinct_count,
  to_number(xmlquery('/ROWSET/ROW/C/text()'
    passing xmltype(dbms_xmlgen.getxml(
      'select count(case when "' || column_name || '" is null then 1 end) as c '
      || 'from "' || owner || '"."' || table_name || '"'))
  returning content)) as null_count
from all_tab_columns
where owner = 'MY_SCHEMA'
and table_name = 'T42'
and data_type in ('NUMBER', 'DATE', 'TIMESTAMP', 'CHAR', 'VARCHAR2',
    'NCHAR', 'NVARCHAR2');

OWNER           TABLE_NAME      COLUMN_NAME     DISTINCT_COUNT NULL_COUNT
--------------- --------------- --------------- -------------- ----------
MY_SCHEMA       T42             ID                           3          0
MY_SCHEMA       T42             STR                          2          1

or with a single XMLTable that extracts both column values from the generated XML, which is modified to do both counts at once:

select a.owner, a.table_name, a.column_name,
  x.distinct_count, x.null_count
from
(
  select owner, table_name, column_name,
    dbms_xmlgen.getxml(
      'select count(distinct "' || column_name || '") as c1,'
        || 'count(case when "' || column_name || '" is null then 1 end) as c2 '
      || 'from "' || owner || '"."' || table_name || '"') as xml_clob
  from all_tab_columns
  where owner = 'MY_SCHEMA'
  and table_name = 'T42'
  and data_type in ('NUMBER', 'DATE', 'TIMESTAMP', 'CHAR', 'VARCHAR2',
    'NCHAR', 'NVARCHAR2')
) a
cross join xmltable (
  '/ROWSET/ROW'
  passing xmltype(a.xml_clob)
  columns distinct_count number path 'C1',
    null_count number path 'C2'
) x;

OWNER           TABLE_NAME      COLUMN_NAME     DISTINCT_COUNT NULL_COUNT
--------------- --------------- --------------- -------------- ----------
MY_SCHEMA       T42             ID                           3          0
MY_SCHEMA       T42             STR                          2          1
Alex Poole
  • 183,384
  • 11
  • 179
  • 318
  • This is perfect! The team grabbing this data won't have permissions to create tables, but the primary query works great. I think we'll likely add an extra condition to match table_name, because we won't actually need every table as well. Is there a way to also get a count of nulls in every column as well? Sorry if that's too far off topic, I can create a new question to figure that out, but I figured I'd ask you while I have you, the author, here! – Davis Bickford Mar 05 '18 at 16:44
  • The `t42` was just a dummy table so you could compare the columns and counts in the output with something. You don't need to create new tables to run this against existing ones. – Alex Poole Mar 05 '18 at 17:08
  • @DavisBickford - I've added a way to get the count of nulls in each column too. – Alex Poole Mar 05 '18 at 17:25