why does the optimizer choose the higher cost execution plan?

Question

This is a re-occuring Problem for me. I have statements that work well for a while and after a while the optimizer decides to choose another execution plan. This even happens for when I query for exactly one (composite) primary key.

When I look up the execution plan in dba_hist_sql_plan, it shows me costs of 20 for the query using the primary key index and costs of 270 for the query doing a full table scan.

plan_hash_value Operation        Options              Cost  Search_Columns

2550672280  0   SELECT STATEMENT                       20
2550672280  1   PARTITION HASH   SINGLE                20
2550672280  2   TABLE ACCESS     BY LOCAL INDEX ROWID  20
2550672280  3   INDEX            RANGE SCAN            19                1

3908080950  0   SELECT STATEMENT                      270
3908080950  1   PARTITION HASH  SINGLE                270
3908080950  2   TABLE ACCESS    FULL                  270

I already noticed that the optimizer only uses the first column in the Primary key index and then does a range scan. But my real question is: Why does the optimizer choose the higher cost execution plan? It's not that both executions plans are used at the same time, I notice a switch within one snapshot and then it stays like that for several hours/days. So it can't be an issue of bind peeking.

Our current solution is that I call our DBA and he flushes the Statement Cache. But this is not really sustainable.

EDIT: The SQL looks something like this: select * from X where X.id1 = ? and X.id2 = ? and X.id3 = ? with (id1,id2,id3) being the composite primary key (with a unique index) on the table.

there can be many reasons. and some of them are valid. most common one is when the exec. plan is applicable onto some other schema. the other reason can be dynamic sampling or performance baselines. — ibre5041, Dec 01 '15 at 12:52
Where is the SQL? Did you look at **adaptive cursor sharing**? — Lalit Kumar B, Dec 01 '15 at 12:55
the SQL is simple. like "select * from x where x.id1 = ? and x.id2 = ? and x.id3 = ?" — EasterBunnyBugSmasher, Dec 01 '15 at 14:06
considering the adaptive cursor sharing: Since I'm querying for the exact 3 columns from the composite primary key, it cannot be an issue of skewed histograms. There must be exactly one or none row in my result set. But I guess adaptive Cursor sharing can be an issue because of the strange behaviour where the optimizer decides to only look up the first column in the index and then do a range scan. I might have to tackle that problem — EasterBunnyBugSmasher, Dec 01 '15 at 15:10
Is it possible that the plans are different because something is occasionally preventing an index access? For example, is the index being periodically dropped and rebuilt? (It's a stupid thing to do, but it happens.) — Jon Heller, Dec 02 '15 at 06:00
the question is answered in [this answer](http://stackoverflow.com/questions/40159348/make-the-optimizer-use-all-columns-of-an-index/41779815?noredirect=1#comment70750230_41779815) — EasterBunnyBugSmasher, Jan 23 '17 at 11:37

score 1 · Answer 1 · answered Dec 10 '15 at 02:11

1

Maybe it's related to one bug on Oracle 11g.
Bug 18377553 : POOR CARDINALITY ESTIMATE WITH HISTOGRAMS AND VALUES > 32 BYTES

When your data is like :

AAAAAAAAAAAAAAAAAAAAmyvalue
AAAAAAAAAAAAAAAAAAAAsomeohtervalue
AAAAAAAAAAAAAAAAAAAAandsoon
B1234

Histograms do not work well.

The solution is disabling histograms on primary key and all will start working smoothly.

answered Dec 10 '15 at 02:11

dcieslak

2,697
1
12
19

very interesting bug. Our Composite Primary key consists (in this order) of: 1) an id from a sequence, max: ~700K with roughly equal distribution 2) a Date representing a day between 2000 and 2018. In most cases we have between 365 and 3650 different days for one id. 3) a Date representing when the row was logically deleted, so in ~95% of all rows this is day representing "our" infinite future. – EasterBunnyBugSmasher Dec 10 '15 at 13:18
and Date is 7 Bytes? So we don't have any values > 32 Bytes. I would understand if the optimizer only chooses the first 2 columns for the search because in 95% of all cases he gets only 1 row then. But stopping after the first column is strange – EasterBunnyBugSmasher Dec 10 '15 at 13:21

score 0 · Answer 2 · answered Dec 01 '15 at 17:22

0

Most likely clustering factor and blevel of the index could be very high. Check the blevel by querying dba_indexes. If blevel is greater than 3 try rebuilding the index.

Also check whether the index created for primary key is unique or not. As per the plan it is using range scan instead of unique scan. Most likely the index is not unique.

answered Dec 01 '15 at 17:22

Durga Viswanath Gadiraju

3,896
2
14
21

1

A poorly performing index could explain why a full table scan is preferable to an index scan. But shouldn't that be reflected in the cost, and the lowest cost plan still used? – Jon Heller Dec 02 '15 at 06:01
the index is unique. The way I understand it: The range scan is a result of the optimizer only going through the index for the first column. blevel is 2 and the clustering_factor is a big issue for this table and varies a lot. – EasterBunnyBugSmasher Dec 02 '15 at 12:20

score 0 · Accepted Answer · answered Jan 23 '17 at 12:36

Apparently the optimizer doesn't correctly display costs regarding type conversions. The root cause for this Problem was incorrect type mapping for a date value. While the column in the database is of type DATE, the JDBC type was incorrectly java.sql.Timestamp. To compare a DATE column with a Timestamp search parameter, all values in the table need to be transferred to Timestamp first. Which is additional cost and renders an index unusable.

why does the optimizer choose the higher cost execution plan?

3 Answers3