Oracle Cost-Based Optimizer: how to use statistics on foreign key column of fact table when criteria is on the dimension table

Question

I am new to this community and I did search for this question. Apologies if I'm asking something that's been asked before.

I am working on a dimensional data warehouse with fact tables and dimensions. This is an Oracle 12 database. The dimensions have a surrogate key column, which is also what appears in a fact table, and then a column with a business value, which maps one-to-one with the surrogate key (along with other attribute columns). The fact tables have a number of foreign keys and then some columns to be aggregated. Some of these foreign keys have very unevenly distributed data so we have generated histograms on these columns in the fact table so the cost-based optimizer knows when a very common value is selected as a criteria that it won't be very selective. For example, we a dimension value which is "blank" about 85% of the time in our fact data but has 20,000 distinct values for the other 15% of the rows. We have a row in our dimension table with a surrogate key which represents "blank" for this dimension value, plus rows for the other 20,000 values. Without the histogram the optimizer thinks these 20,000 values are equally distributed so it can make very bad choices when someone specifies the blank one.

This works fine when I run a query with criteria specified on our fact table. The optimizer recognizes the histogram statistics and provides cardinality estimates that are in the right ballpark. However, if I specify the criteria on the business key on the dimension side of the join, the statistics are not used and the cardinality estimate is way off.

select *
from FACT
  , DIM
where FACT.surrogate_key = DIM.surrogate_key
  and DIM.surrogate_key = 0 -- (zero means blank)

Explain plan cardinality estimate: 53 million rows (about right). There are about 65 million rows total in the fact table and about 53 million of them represent data where this attribute is blank.

However, if I filter on the business key, which is what users actually do, then the cardinality estimate is way off.

select *
from FACT
  , DIM
where FACT.surrogate_key = DIM.surrogate_key
  and DIM.business_key = '(blank)'

Explain plan cardinality estimate: 14,000 rows (not even close)

How do I get the CBO to use the histogram when the criteria is specified on the dimension table (and not on the join column)?

Thank you.

score 0 · Answer 1 · answered May 17 '18 at 01:44

you should use oracle 10053 event trace to track the information you need.

please note that only hard parse will generate 10053 event trace,

soft parse will not generate 10053 event trace.

but even you get 10053 event trace and read it carefullly,

the mechanism of oracle cbo is not clear only based on 10053 event trace.

the following is an example about 10053 event trace example:

sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Thu May 17 09:36:41 2018

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> alter session set events '10053 trace name context forever';

Session altered.

SQL> alter session set tracefile_identifier='yaoweijq';

Session altered.

SQL> select * from obj$ where obj# = 0;

no rows selected

then you could execute

show parameter user_dump_dest

the output will contain a directory, under the directory,

ls -ltr yaoweijq

the file ends with .trc is the 10053 event trace file.

Thanks for the response. But I already know what the SQL is doing. In the explain plan it has a lookup in the dimension table where it looks up a row based on the business key filter. But then, when it filters the fact table, it shows that the filter is on some arbitrary value (something like FACT.surrogate_key = C0). So what I think is that the optimizer doesn't know what value the first part of the query is going to find, and so it gets an row estimate by just multiplying the row count by the density in that column. I want it to estimate the rows based on the actual surrogate key value. — aloofball, May 18 '18 at 14:50
about the second sql of business_key, yes, the cbo do not know the actual data reference rate(think you should get the key information) about business_key and surrogate_key, as the default histgrm will not gather it. it seems you could try the extended stats, the give us the test result. — yaoweijq, May 22 '18 at 07:01

Oracle Cost-Based Optimizer: how to use statistics on foreign key column of fact table when criteria is on the dimension table

1 Answers1