-2

I need an explanation and an advice on the execution plan in Oracle.

I have 3 tables in my Oracle database:

  • t (id, name, t1_id, t2_id)
  • t1 (id, name)
  • t2 (id, name)

id is a primary key in each table.

Table t has 13 000 000 rows, t1 has 4 000 000 rows and t2 has 1 000 000 rows.

The following query shows "hash join" in execution plan:

select
    t.*
from t
left join t1 on t1.id = t.t1_id

But the same query with table t2 instead of t1 doesn't show "hash join" in execution plan (there is only "table access" of table t):

select
    t.*
from t
left join t2 on t2.id = t.t2_id

In my opinion none of two queries should show "hash join" in execution plan because left join operation can not change the number of rows in the result set due to primary keys and we do not choose any of fields from t1 and t2 in the select statement. In case of table t2 it works good. Why the execution plan is different in case of table t1?

To make it more clear, I need a such behavior because in my real case there are tens of different left joins and different users of my BI will choose different set of fields. But no one will choose all fields. And I want Oracle to implement left joins only for those tables, which fields are choosen. If Oracle will implement all left joins it will be too long query.

Room'on
  • 101
  • 1
  • 7
  • databases use hash index to get the coln when you use = it is the festest way to get the correct row from t2 – nbk Aug 15 '23 at 20:35
  • but I don't get columns of t1 or t2 in my select statements, only column of table t – Room'on Aug 15 '23 at 20:44
  • that doesn't matter first a sql serverprocess the from, where is a join (and where applies the first comment) and the comes the select, basically you don't need to t2 at all, so, but the engive of the sql server doesn't care – nbk Aug 15 '23 at 20:47
  • why the behavior is different for t1 and t2? – Room'on Aug 15 '23 at 20:51
  • the query optimizer thinks that this is faster than the hash – nbk Aug 15 '23 at 21:17
  • Why? In case of t1 there are "table access" to t and "hash join" on t1, in case of t2 there is only "table access" to t. In case of t1 "table access" + "hash join" is not faster than only "table acess". Why optimizer thinks that this is faster? Is there a way to get the same behavior for t1, as for t2? – Room'on Aug 15 '23 at 21:48
  • if your t2 has the same number of rows and a similar granulation, you would get the same explain(which you didn't show), why a query optimizer does what,for that you need to debug the source code, but as nobody will ever program such a thingm it is a good question, that never will be answered – nbk Aug 15 '23 at 22:06

1 Answers1

0

Oracle cannot bypass the join simply because it's an outer join and you aren't requesting any columns. If the table to which you are outer joining has multiple rows per key, then your result set would be increased accordingly, one row from t for every matching row in t1, whether you are asking to see any t1 columns or not. Therefore semantically Oracle has to perform the join.

The only time it can skip an unreferenced outer join is either when:

  1. it knows that an outer join will by definition find 0 matches (e.g. if you use AND logic in the join clause with a literal false expression (1=2 AND ...), or if you key on a literal NULL),
  2. or it knows that the join key cannot have duplicate values. It will only know this if a unique index is created on the join key. If the join key is unique, then it cannot increase the row count of the driving query so isn't needed for that purpose. Check t2 and see if the index on id is unique, then compare the index on t1.id and see if it's non-unique (or missing). There would be your difference. Thanks to the index, it knows something about t2 that it doesn't know about t1, enabling it to prune the table from the query altogether.

As to the join method, Oracle will choose between (1) nested loops, (2) sort-merge join, (3) hash join based on its costing methodology and assumptions about your data. Your queries are applying no predicates, and the driving result set is large - it's no wonder it prefers a hash join. It's correct to do so. A nested loops (assuming you have an index on t1.id) would require 13 million index lookups and if matches are found another 13 million single block gets from the table segment. That's a lot more work than a few thousand multiblock reads (especially if they go direct path) to do a full table scan and building a modestly sized hash table in memory. Join method has nothing to do with logic/results and is unrelated to your issue with the t2 join being missing. It's purely a performance consideration once all the join inputs are known.

Paul W
  • 5,507
  • 2
  • 2
  • 13
  • Thank you for the answer. Join key cannot have duplicate values because t1.id and t2.id are primary keys. As I understand, it is the same as unique index. Do you have any other ideas why the behavior of t1 and t2 can be different? – Room'on Aug 16 '23 at 06:48
  • Do you have referential (ie foreign key) constraints defined between the tables? – BobC Aug 16 '23 at 11:39
  • No, there are no foreign keys. – Room'on Aug 16 '23 at 11:41
  • @Room'on - Constraints can be disabled, indexes can be unusable. Did you check by querying `all_indexes`? Are both primary key indexes in a usable state? If the one on `t1.id` is unusable, it can't be used to enforce uniqueness. If that's the case then rebuild it. If you still get a difference, please post the actual execution plan for each of your queries. Your question doesn't actually show us the plan. – Paul W Aug 16 '23 at 12:12
  • I have found the reason. Table t1 was partitioned and t2 was not. Now my problem is that I can't create the unique global index on t1.id – Room'on Aug 16 '23 at 14:04
  • 1
    @Room'on, ah, so there was no unique index on `t1.id`, as I suspected. You can certainly create it... `create unique index pk_t1 on t1(id)`. As long as you leave out the `local` keyword it'll be global and you won't have to include the partition key. If you want to add an explicit constraint on top of it (not necessary), you can also do that after creating the index manually (`alter table t1 add constraint pk_t1 primary key (id) using index pk_t1`). But the unique index by itself even without a constraint will do the trick. – Paul W Aug 16 '23 at 14:30
  • I have an error: SQL Error [1408] [72000]: ORA-01408: such column list already indexed. Should I drop PK constraint or is there another way? – Room'on Aug 16 '23 at 14:38
  • Is the existing index non-unique? If so, then drop it and recreate it as unique. You can get the index name from `all_indexes`. – Paul W Aug 16 '23 at 14:49
  • There is only one row in ```all_indexes``` with the name of constraint for primary key and I can't drop it: SQL Error [2429] [72000]: ORA-02429: cannot drop index used for enforcement of unique/primary key. Is the primary key global or local? – Room'on Aug 16 '23 at 15:09
  • If you have a PK constraint on the index, it's already unique. If it's unique it's probably local and has the partitioning key added as a second column in the index. In this case, you can (1) drop the constraint (`alter table t1 drop constraint pk_t1`), (2) drop the index (`drop index pk_t1`) which may or may not have already auto-dropped, (3) create the new index, (4) create the new constraint. By the way, you can see locality by querying `all_part_indexes.locality` or checking if `all_indexes.tablespace_name` is null (meaning it's local) or by querying `all_ind_partitions`. – Paul W Aug 16 '23 at 15:16