2

I like to find all columns in my Oracle database schema that only contain numeric data but having a non-numeric type. (So basically column-candidates with probably wrong chosen data types.)

I have a query for all varchar2-columns:

SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE
FROM user_tab_cols
WHERE DATA_TYPE = 'VARCHAR2';

Furthermore I have a query to check for any non-numeric data inside a table myTable and a column myColumn:

SELECT 1
FROM myTable
WHERE NOT REGEXP_LIKE(myColumn, '^[[:digit:]]+$');

I like to combine both queries in that way that the first query only returns the rows where not exists the second.

The main problem here is that the first query is on meta layer of the data dictionary where TABLE_NAME and COLUMN_NAME comes as data and I need that data as identifiers (and not as data) in the second query.

In pseudo-SQL I have something like that in mind:

SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE
FROM user_tab_cols
WHERE DATA_TYPE = 'VARCHAR2'
AND NOT EXISTS
(SELECT 1 from asIdentifier(TABLE_NAME) 
WHERE NOT REGEXP_LIKE(asIdentifier(COLUMN_NAME), '^[[:digit:]]+$'));
Fabian Barney
  • 14,219
  • 5
  • 40
  • 60
  • 1
    OK - the problem you think you have (you do have it, it's a real problem) will probably lead you to dynamic SQL. Before you even go there, though - you need to address a few other, potentially simpler issues. First, what data types may contain "numeric data" to begin with? I suppose only VARCHAR2, CHAR and similar; how would a DATE or TIMESTAMP column contain anything that could be called "numeric data"? Then: things like REGEXP_LIKE will work only on string type columns anyway. Also, if you look at CHAR data type, should something with trailing spaces (`'123 '`) be considered number? –  Dec 18 '17 at 17:40
  • Good point. I'll adjust the question. Basically I am interested in varchar2-columns. About considering spaces it does not really matter - do not need trimming but would be ok. – Fabian Barney Dec 18 '17 at 17:43
  • 2
    Then: if you are looking for strings that represent numbers, isn't `1.5` a number also? How about `-33`? Your current solution will not view them as numbers. How about positive integers, represented in scientific notation? `'1.0E+03'` is 1000 in scientific notation. –  Dec 18 '17 at 17:45
  • The query does not have to be perfect. It's just a query to hint me for wrong choosen column types. I want to check them afterwards. For the first shot only integers would be fine. Don't care about negative numbers and floats for the first shot. – Fabian Barney Dec 18 '17 at 17:49

3 Answers3

2

Create a function as this:

create or replace function isNumeric(val in VARCHAR2) return INTEGER AS
res NUMBER;
begin
   res := TO_NUMBER(val);
   RETURN 1;
EXCEPTION
   WHEN OTHERS THEN
      RETURN 0;
END;

Then you can use it like this:

DECLARE
  r integer;
BEGIN
   For aCol in (SELECT TABLE_NAME, COLUMN_NAME FROM user_tab_cols WHERE DATA_TYPE = 'VARCHAR2') LOOP
      -- What about CHAR and CLOB data types?
      execute immediate 'select count(*) from '||aCol.TABLE_NAME||' WHERE isNumeric('||aCol.COLUMN_NAME||') = 0' into r;
      if r = 0 then
         DBMS_OUTPUT.put_line(aCol.TABLE_NAME ||' '||aCol.COLUMN_NAME ||' contains numeric values only');
      end if;
   end loop;
end;

Note, the performance of this PL/SQL block will be poor. Hopefully this is a one-time-job only.

Fabian Barney
  • 14,219
  • 5
  • 40
  • 60
Wernfried Domscheit
  • 54,457
  • 9
  • 76
  • 110
  • Thanks a lot for your affort! I tested it and it works. Of course this does not perform very well but this is totally ok for the given purpose. – Fabian Barney Dec 19 '17 at 10:40
1

There are two possible approaches: dynamic SQL (DSQL) and XML.

First one was already demonstrated in another reply and it's faster.

XML approach just for fun

create or replace function to_number_udf(p in varchar2) return number
  deterministic is
  pragma udf;
begin
  return p * 0;
  exception when invalid_number or value_error then return 1;
end to_number_udf;
/

create table t_chk(str1, str2) as
select '1', '2' from dual union all
select '0001.1000', 'helloworld' from dual;

SQL> column owner format a20
SQL> column table_name format a20
SQL> column column_name format a20
SQL> with tabs_to_check as
  2  (
  3  select 'collection("oradb:/'||owner||'/'||table_name||'")/ROW/'||column_name||'/text()' x,
  4         atc.*
  5    from all_tab_columns atc
  6   where table_name = 'T_CHK'
  7     and data_type = 'VARCHAR2'
  8     and owner = user
  9  )
 10  select --+ no_query_transformation
 11         owner, table_name, column_name
 12    from tabs_to_check ttc, xmltable(x columns "." varchar2(4000)) x
 13  group by owner, table_name, column_name
 14  having max(to_number_udf(".")) = 0;

OWNER                TABLE_NAME           COLUMN_NAME
-------------------- -------------------- --------------------
TEST                 T_CHK                STR1

PS. On Oracle 12.2 you can use to_number(... default ... on conversion error) instead of UDF.

Dr Y Wit
  • 2,000
  • 9
  • 16
  • Thanks for your affort! I did not test it, because another answer already worked for me. Nevertheless I appreciate your work. Thanks. – Fabian Barney Dec 19 '17 at 10:44
1

The faster way to check if a string is all digits vs. contains at least one non-digit character is to use the translate function. Alas, due to the non-SQL Standard way Oracle handles empty strings, the form of the function we must use is a little complicated:

translate(input_string, 'z0123456789', 'z')

(z can be any non-digit character; we need it so that the third argument is not null). This works by translating z to itself and 0, etc. to nothing. So if the input string was null or all-digits, and ONLY in that case, the value returned by the function is null.

In addition: to make the process faster, you can test each column with an EXISTS condition. If a column is not meant to be numeric, then in most cases the EXISTS condition will become true very quickly, so you will have to inspect a very small number of values from such columns.

As I tried to make this work, I ran into numerous side issues. Presumably you want to look in all schemas (except SYS and perhaps SYSTEM). So you need to run the procedure (anonymous block) from an account with SYSDBA privileges. Then - I ran into issues with non-standard table and column names (names starting with an underscore and such); which brought to mind identifiers defined in double-quotes - a terrible practice.

For illustration, I will use the HR schema - on which the approach worked. You may need to tweak this further; I wasn't able to make it work by changing the line

and owner = 'HR'

to

and owner != 'SYS'

So - with this long intro - here is what I did.

First, in a "normal" user account (my own, named INTRO - I run a very small database, with only one "normal" user, plus the Oracle "standard" users like SCOTT, HR etc.) - so, in schema INTRO, I created a table to receive the owner name, table name and column name for all columns of data type VARCHAR2 and which contain only "numeric" values or null (numeric defined the way you did.) NOTE HERE: If you then want to really check for all numeric values, you will indeed need a regular expression, or something like what Wernfried has shown; I would still, otherwise, use an EXISTS condition rather than a COUNT in the anonymous procedure.

Then I created an anonymous block to find the needed columns. NOTE: You will not have a schema INTRO - so change it everywhere in my code (both in creating the table and in the anonymous block). If the procedure completes successfully, you should be able to query the table. I show that at the end too.

While logged in as SYS (or another user with SYSDBA powers):

create table intro.cols_with_numbers (
  owner_name  varchar2(128),
  table_name  varchar2(128),
  column_name varchar2(128)
);

declare x number;
begin
  execute immediate 'truncate table intro.cols_with_numbers';
  for t in ( select owner, table_name, column_name
             from   dba_tab_columns
             where  data_type like 'VARCHAR2%'
               and  owner = 'HR'
           ) 
  loop
    execute immediate 'select case when exists (
                                select *
                                from ' || t.owner || '.' || t.table_name ||
                              ' where  translate(' || t.column_name || ',
                                         ''z0123456789'', ''z'') is not null
                              ) then 1 end
                       from   dual'    
    into x;
    if x is null then
      insert into intro.cols_with_numbers (owner_name, table_name, column_name)
         values(t.owner, t.table_name, t.column_name);
    end if;
  end loop;
end;
/

Run this procedure and then query the table:

select * from intro.cols_with_numbers;

no rows selected

(which means there were no numeric columns in tables in the HR schema, in the wrong data type VARCHAR2 - or at least, no such columns that had only non-negative integer values.) You can test further, by intentionally creating a table with such a column and testing to see it is "caught" by the procedure.

ADDED - Here is what happens when I change the owner from 'HR' to 'SCOTT':

PL/SQL procedure successfully completed.


OWNER_NAME           TABLE_NAME           COLUMN_NAME        
-------------------- -------------------- --------------------
SCOTT                BONUS                JOB                 
SCOTT                BONUS                ENAME   

so it seems to work fine (although on other schemas I sometimes run into an error... I'll see if I can figure out what that is).

In this case the table is empty (no rows!) - this is one example of a "false positive" you may find. (More generally, you will get a false positive if everything in a VARCHAR2 column is null - in all rows of the table.)

NOTE also that a column may have only numeric values and still the best data type would be VARCHAR2. This is the case when the values are simply identifiers and are not meant as "numbers" (which we can compare to each other or to fixed values, and/or with which we can do arithmetic). Example - a SSN (Social Security Number) or the equivalent in other countries; the SSN is each person's "official" identifier for doing business with the government. The SSN is numeric (actually, perhaps to accentuate the fact it is NOT supposed to be a "number" despite the name, it is often written with a couple of dashes...)

  • Wow, thanks a lot for your affort! Currently I do not have SYSDBA power here. But this here seems to be thought and tested very thoroughly. I really appreciate this! – Fabian Barney Dec 19 '17 at 10:37
  • @FabianBarney - If you don't have DBA privileges, then you can only do this either (1) for the tables in your own schema, or (2) for tables to which you have privileges. If so, you select from ALL_TAB_COLUMNS instead of DBA_TAB_COLUMNS. I also found what was tripping my solution: backup tables (in the recycle bin). At the top of the anonymous procedure, you need to add `... and table_name not like 'BIN$'` to exclude those from the search. –  Dec 19 '17 at 13:23