I am new to this community and I did search for this question. Apologies if I'm asking something that's been asked before.
I am working on a dimensional data warehouse with fact tables and dimensions. This is an Oracle 12 database. The dimensions have a surrogate key column, which is also what appears in a fact table, and then a column with a business value, which maps one-to-one with the surrogate key (along with other attribute columns). The fact tables have a number of foreign keys and then some columns to be aggregated. Some of these foreign keys have very unevenly distributed data so we have generated histograms on these columns in the fact table so the cost-based optimizer knows when a very common value is selected as a criteria that it won't be very selective. For example, we a dimension value which is "blank" about 85% of the time in our fact data but has 20,000 distinct values for the other 15% of the rows. We have a row in our dimension table with a surrogate key which represents "blank" for this dimension value, plus rows for the other 20,000 values. Without the histogram the optimizer thinks these 20,000 values are equally distributed so it can make very bad choices when someone specifies the blank one.
This works fine when I run a query with criteria specified on our fact table. The optimizer recognizes the histogram statistics and provides cardinality estimates that are in the right ballpark. However, if I specify the criteria on the business key on the dimension side of the join, the statistics are not used and the cardinality estimate is way off.
select *
from FACT
, DIM
where FACT.surrogate_key = DIM.surrogate_key
and DIM.surrogate_key = 0 -- (zero means blank)
Explain plan cardinality estimate: 53 million rows (about right). There are about 65 million rows total in the fact table and about 53 million of them represent data where this attribute is blank.
However, if I filter on the business key, which is what users actually do, then the cardinality estimate is way off.
select *
from FACT
, DIM
where FACT.surrogate_key = DIM.surrogate_key
and DIM.business_key = '(blank)'
Explain plan cardinality estimate: 14,000 rows (not even close)
How do I get the CBO to use the histogram when the criteria is specified on the dimension table (and not on the join column)?
Thank you.