0

How can I improve the performance by joining two big tables and sort by 1st able unique index?

I need only 1st table data with sort by. without order query will performance so fast.

Here is the example of queries

select a.* from T1 a, T2 b where a.c1 = b.c2; 
select a.* from T1 a, T2 b where a.c1 = b.c2 order by a.id;

just FYI, T1 and T2 have the proper index.

T1 table count is "54483938"
T2 table count is "54483820"

I am more interest in T1 data with sort by which T1 Records exist on T2.

I tried using an in operator query, it took me into 300 sec.

F0XS
  • 1,271
  • 3
  • 15
  • 19
  • Not sure I understand the question. Do you expect that the result of the join will also be close to 54 million rows? How quickly do you expect those to be ordered? Now, you mention that a.id has a unique index... is it also NOT NULL? If so, if you only select a.id from the join, then perhaps you could get faster execution (taking advantage of the index being ordered already); but NOT if you also need the other columns from table a! –  Dec 05 '17 at 15:30
  • 1
    Tip of today: Switch to modern, explicit `JOIN` syntax. Easier to write (without errors), easier to read (and maintain), and easier to convert to outer join if needed. – jarlh Dec 05 '17 at 15:31
  • could you please provide the syntax on that and would like to know the Oracle version 11g have to support or not? – user3260053 Dec 05 '17 at 15:34
  • @mathguy, it might come with 54 millions of records. index column is not null field. – user3260053 Dec 05 '17 at 15:36
  • `from T1 a join T2 b on a.c1 = b.c2`. Of course Oracle 11g supports this standard SQL syntax. The comma-separated syntax you are using was made redundant in 1992 (though it took Oracle until 2001 to adopt it). – Thorsten Kettner Dec 05 '17 at 15:36
  • @ThorstenKettner - "redundant" is not quite right. There are still a small number of places where Oracle **only** works with its own proprietary join syntax, and does not work with ANSI join syntax. One of those is in the definition of materialized views - or actually in whether fast refresh (or refresh on commit - I never remember, it's one or the other) can be set on the MV. If you want fast refresh on commit and the view has joins, you MUST use comma-syntax for joins. –  Dec 05 '17 at 15:40
  • @mathguy: Arghh, really? Well, it's only been some twenty-five years since the new join syntax was introduced. Oracle may still be working on this ;-) – Thorsten Kettner Dec 05 '17 at 15:43
  • @ThorstenKettner - Yes, really. There is at least one more similar instance, which I can never remember since I never ran into it. Especially odd since Oracle has already written the necessary code to translate ANSI syntax to its own (which it still uses internally); why those exceptions still exist is quite the mystery in my view. –  Dec 05 '17 at 15:47
  • @user3260053 How many T1 records have a match in T2? – Thorsten Kettner Dec 05 '17 at 15:59
  • out of curiosity, can you indicate how the result set is going to be consumed (e.g. a staging table in a data warehouse) where you need it sorted. – Patrick Bacon Dec 05 '17 at 18:16

1 Answers1

2

You can try the three forms of the query:

  • join (which you have)
  • in (which you claim to have run)
  • exists

The exists version is:

select a.*
from T1 a
where exists (select 1 from T2 b where a.c1 = b.c2)
order by a.id;

For this query, I would recommend indexes on T1(id, c1) and T2(c2).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Why a composite index on `a(id, c1)`? And also, don't you need an index on `a(c1)` alone, to help the join? –  Dec 05 '17 at 15:42
  • 1
    @mathguy: The idea is to have the data sorted already with this index. So if the index were read and T2 looked up in nested loops, you'd get the matches already ordered by id. With such big tables and no criteria, however, I think the DBMS will read the full tables and hash join the records anyway. – Thorsten Kettner Dec 05 '17 at 15:57