I'm dealing with a huge table that has 618 columns. The team does not want to embark on the research of what columns are not being used and modifying the code to delete them, and I understand the time constrain. One suggestion is to identify which columns from that table are the most accessed or frequently used and change the order of the columns. I left the tunning to the experts, and I got assigned to identify such columns.
So, I query the all_tab_columns trying to figure it out a simple way to complete the task. Four columns caught my attention: Num_distinct, Density, Num_nulls, Sample_size. I thought that the ratio between the sample_size and the num_nulls could be a way to ID the most used columns. The closest this ratio is to 1, the higher is the probability of being used. But as usual, it can not be that easy because I found that I get crazy ratios since I have columns within the NUM-NULLS in the 80Millions-ish, and the SAMPLE_SIZE is only 141.
I wonder if someone had a similar situation or if someone has a suggestion of what is the best approach for this. Thank you in advance.
I executed the query below and generated a list of the columns based on the stats. At least this way, I have an idea of what I need to deliver.
select ts.owner, ts.table_name, ts.column_name, tc.column_id, t.num_rows, ts.num_nulls, case when nvl(ts.num_nulls,0) > 0 then (1 - round(((ts.num_nulls / t.num_rows)),6)) when nvl(ts.num_nulls,0) = 0 then 1 end UsagePerc, ts.num_distinct, ts.density from all_tables t join all_tab_col_statistics ts on ts.owner = t.owner and ts.table_name = t.table_name join all_tab_columns tc on tc.owner = ts.owner and tc.table_name = ts.table_name and tc.column_name = ts.column_name where t.owner = 'THE_OWNER' and t.table_name = 'THE_HUGE_TABLE' order by ts.num_nulls, ts.density desc