1

I'm dealing with a huge table that has 618 columns. The team does not want to embark on the research of what columns are not being used and modifying the code to delete them, and I understand the time constrain. One suggestion is to identify which columns from that table are the most accessed or frequently used and change the order of the columns. I left the tunning to the experts, and I got assigned to identify such columns.

So, I query the all_tab_columns trying to figure it out a simple way to complete the task. Four columns caught my attention: Num_distinct, Density, Num_nulls, Sample_size. I thought that the ratio between the sample_size and the num_nulls could be a way to ID the most used columns. The closest this ratio is to 1, the higher is the probability of being used. But as usual, it can not be that easy because I found that I get crazy ratios since I have columns within the NUM-NULLS in the 80Millions-ish, and the SAMPLE_SIZE is only 141.

I wonder if someone had a similar situation or if someone has a suggestion of what is the best approach for this. Thank you in advance.

I executed the query below and generated a list of the columns based on the stats. At least this way, I have an idea of what I need to deliver.

select ts.owner, ts.table_name, ts.column_name, tc.column_id, t.num_rows, ts.num_nulls, case when nvl(ts.num_nulls,0) > 0 then (1 - round(((ts.num_nulls / t.num_rows)),6)) when nvl(ts.num_nulls,0) = 0 then 1 end UsagePerc, ts.num_distinct, ts.density from all_tables t join all_tab_col_statistics ts on ts.owner = t.owner and ts.table_name = t.table_name join all_tab_columns tc on tc.owner = ts.owner and tc.table_name = ts.table_name and tc.column_name = ts.column_name where t.owner = 'THE_OWNER' and t.table_name = 'THE_HUGE_TABLE' order by ts.num_nulls, ts.density desc

jjsai
  • 11
  • 3
  • 1
    THat's going to be pretty tough to do. You can query v$sqlarea which shows recently executed queries including the #times executed, and the sql text. search that with the results from user_tab_columns for the table in question. Note that v$sqlarea is transient and the data in there will change all the time . BTW, having a table with 618 columns sounds like a giant design flaw. Good luck! – OldProgrammer Mar 26 '20 at 21:16
  • I will try different approaches and present what I have. I will check yours and see what I get. "having a table with 618 columns sounds like a giant design flaw" Believe me, I told them, but apparently, it is "too late" Thank you so much! – jjsai Mar 26 '20 at 22:45
  • You want to find columns more often used in queries than others and modify the table so those rows come first when selecting *? This is a weird idea and I don't think that you'd gain anything with that approach. What is it you *want* to gain after all? I don't even understand the objective. – Thorsten Kettner Mar 27 '20 at 00:02
  • What is the problem you are observing? Is it a extesive *parsing time*? – Marmite Bomber Mar 27 '20 at 00:37
  • Agree with others that re-ordering the columns isn't necessarily going to help you with anything; dropping obsolete cols would reduce i/o on full scans. The info in all_tab_columns is telling you metrics about the data in the columns, but it can't tell you how that data is used, or how frequently it is used by the application. Also, identifying columns used by examining SQL will not be guaranteed accurate - you have no way to know if you've captured every possible SQL statement. The _only_ way to know for sure is to go through the application source code; anything else is just a guess. – pmdba Mar 27 '20 at 02:56
  • See here: https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:9530992000346894870 for other benefits of keeping your table to 255 columns or less. – pmdba Mar 27 '20 at 02:56
  • Thank you, everyone, for your comments. I'm just the messenger here. I'm aware of the considerable impact on the performance that having 618 columns have. I'm not a tunning expert, so I'm not aware of why the team decided to do this. My goal (the task assigned to me) is to provide the list of the columns that are used most, from one particular table. Again, thank you for all your comments. I appreciate all your help. – jjsai Mar 27 '20 at 14:41

1 Answers1

0

The best way to see how often columns are used is to enable (fine-grained) auditing and let it run for a while. Then review the statements captured:

grant create session, unlimited tablespace 
  to u 
  identified by u;

create table u.t (
  c1 int, c2 int, c3 int
);

exec dbms_fga.add_policy ( 'U', 'T', 'AUDIT_COLS', enable => true, statement_types => 'INSERT,UPDATE,DELETE,SELECT' );

conn u/u

insert into t ( c1 ) values ( 1 );
insert into t values ( 2, 2, 2 );

select c2 from t;
select * from t;

update t
set    c3 = 1
where  c1 = 1;

delete t
where  c2 is null;
commit;

conn chris

select sql_text 
from   dba_fga_audit_trail;

SQL_TEXT                               
insert into t ( c1 ) values ( 1 )       
insert into t values ( 2, 2, 2 )        
select c2 from t                        
select * from t                         
update t
set    c3 = 1
where  c1 = 1    
delete t
where  c2 is null  

Note you get the complete statements, so you'll need to do some wrangling to extract the columns.

Chris Saxon
  • 9,105
  • 1
  • 26
  • 42