I have a table with 1.5 million rows. I run a query which fetches records having non repeating values in a column. I am observing a behaviour in which after creating indexes the performance of the query degrades. I also used dbms_stats with 100% estimate percentage(compute mode) to gather statistics so that oracle 11g CBO makes a more informed decision for the query plan, but it doesnt improve the query execution time.
SQL> desc tab3;
Name Null? Type
----------------------------------------------
COL1 NUMBER(38)
COL2 VARCHAR2(100)
COL3 VARCHAR2(36)
COL4 VARCHAR2(36)
COL5 VARCHAR2(4000)
COL6 VARCHAR2(4000)
MEASURE_0 VARCHAR2(4000)
MEASURE_1 VARCHAR2(4000)
MEASURE_2 VARCHAR2(4000)
MEASURE_3 VARCHAR2(4000)
MEASURE_4 VARCHAR2(4000)
MEASURE_5 VARCHAR2(4000)
MEASURE_6 VARCHAR2(4000)
MEASURE_7 VARCHAR2(4000)
MEASURE_8 VARCHAR2(4000)
MEASURE_9 VARCHAR2(4000)
The column measure_0
has 0.4 million unique values.
SQL> select count(*) from (select measure_0 from tab3 group by measure_0 having count(*) = 1) abc;
COUNT(*)
----------
403664
The following is the query with the execution plan, please note there are no indexes on the table.
SQL> set autotrace traceonly;
SQL> SELECT * FROM (
2 SELECT
3 (ROWNUM -1) AS COL1,
4 ft.COL1 AS OLD_COL1,
5 ft.COL2,
6 ft.COL3,
7 ft.COL4,
8 ft.COL5,
9 ft.COL6,
10 ft.MEASURE_0,
11 ft.MEASURE_1,
12 ft.MEASURE_2,
13 ft.MEASURE_3,
14 ft.MEASURE_4,
15 ft.MEASURE_5,
16 ft.MEASURE_6,
17 ft.MEASURE_7,
18 ft.MEASURE_8,
19 ft.MEASURE_9
20 FROM tab3 ft
21 WHERE MEASURE_0 IN
22 (
23 SELECT MEASURE_0
24 FROM tab3
25 GROUP BY MEASURE_0
26 HAVING COUNT(*) = 1
27 )
28 ) ABC WHERE COL1 >= 0 AND COL1 <=449;
450 rows selected.
Elapsed: 00:00:01.90
Execution Plan
----------------------------------------------------------
Plan hash value: 3115757351
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1243 | 28M| 717K (1)| 02:23:29 |
|* 1 | VIEW | | 1243 | 28M| 717K (1)| 02:23:29 |
| 2 | COUNT | | | | | |
|* 3 | HASH JOIN | | 1243 | 30M| 717K (1)| 02:23:29 |
| 4 | VIEW | VW_NSO_1 | 1686K| 3219M| 6274 (2)| 00:01:16 |
|* 5 | FILTER | | | | | |
| 6 | HASH GROUP BY | | 1 | 3219M| 6274 (2)| 00:01:16 |
| 7 | TABLE ACCESS FULL| TAB3 | 1686K| 3219M| 6196 (1)| 00:01:15 |
| 8 | TABLE ACCESS FULL | TAB3 | 1686K| 37G| 6211 (1)| 00:01:15 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COL1">=0 AND "COL1"<=449)
3 - access("MEASURE_0"="MEASURE_0")
5 - filter(COUNT(*)=1)
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
354 recursive calls
0 db block gets
46518 consistent gets
45122 physical reads
0 redo size
43972 bytes sent via SQL*Net to client
715 bytes received via SQL*Net from client
31 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
450 rows processed
The query takes up 1.90 seconds. If I run the query again it takes 1.66 seconds. Why does it take more time in the first run?
In order to speed it up I created indexes on the two columns used in the query.
SQL> create index ind_tab3_orgid on tab3(COL1);
Index created.
Elapsed: 00:00:01.68
SQL> create index ind_tab3_msr_0 on tab3(measure_0);
Index created.
Elapsed: 00:00:01.83
When I fired the query after this for the first time it took a whooping 21 seconds to come back. Whereas subsequent runs bought it to 2.9 seconds. Why does oracle take so much time in the first run, is it warming up or something.. baffles me!
This is the plan when it takes 2.9 seconds-
450 rows selected.
Elapsed: 00:00:02.92
Execution Plan
----------------------------------------------------------
Plan hash value: 240271480
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1243 | 28M| 711K (1)| 02:22:15 |
|* 1 | VIEW | | 1243 | 28M| 711K (1)| 02:22:15 |
| 2 | COUNT | | | | | |
| 3 | NESTED LOOPS | | | | | |
| 4 | NESTED LOOPS | | 1243 | 30M| 711K (1)| 02:22:15 |
| 5 | VIEW | VW_NSO_1 | 1686K| 3219M| 6274 (2)| 00:01:16 |
|* 6 | FILTER | | | | | |
| 7 | HASH GROUP BY | | 1 | 3219M| 6274 (2)| 00:01:16 |
| 8 | TABLE ACCESS FULL | TAB3 | 1686K| 3219M| 6196 (1)| 00:01:15 |
|* 9 | INDEX RANGE SCAN | IND_TAB3_MSR_0 | 1243 | | 2 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID| TAB3 | 1243 | 28M| 44 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COL1">=0 AND "COL1"<=449)
6 - filter(COUNT(*)=1)
9 - access("MEASURE_0"="MEASURE_0")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
660054 consistent gets
22561 physical reads
0 redo size
44358 bytes sent via SQL*Net to client
715 bytes received via SQL*Net from client
31 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
450 rows processed
I was expecting the time to be lower than when the table was non indexed. Why does the indexed version of the table takes more time to fetch the results than the non indexed version? If I am not wrong it is the TABLE ACCESS BY INDEX ROWID that is taking up time. Can I enforce oracle to use TABLE ACCESS FULL?
I then gathered statistics on the table so that CBO improves the plan with compute option. So now the statistics would be accurate.
SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'EQUBE67DP', tabname=>'TAB3',estimate_percent=>null,cascade=>true);
PL/SQL procedure successfully completed.
Elapsed: 00:01:02.47
SQL> set autotrace off;
SQL> select COLUMN_NAME,NUM_DISTINCT,SAMPLE_SIZE,HISTOGRAM,LAST_ANALYZED from dba_tab_cols where table_name = 'TAB3' ;
COLUMN_NAME NUM_DISTINCT SAMPLE_SIZE HISTOGRAM LAST_ANALYZED
------------------------------ ------------ ----------- --------------- ---------
COL1 1502257 1502257 NONE 27-JUN-12
COL2 0 NONE 27-JUN-12
COL3 1 1502257 NONE 27-JUN-12
COL4 0 NONE 27-JUN-12
COL5 1502257 1502257 NONE 27-JUN-12
COL6 1502257 1502257 NONE 27-JUN-12
MEASURE_0 405609 1502257 HEIGHT BALANCED 27-JUN-12
MEASURE_1 128570 1502257 NONE 27-JUN-12
MEASURE_2 1502257 1502257 NONE 27-JUN-12
MEASURE_3 185657 1502257 NONE 27-JUN-12
MEASURE_4 901 1502257 NONE 27-JUN-12
MEASURE_5 17 1502257 NONE 27-JUN-12
MEASURE_6 2202 1502257 NONE 27-JUN-12
MEASURE_7 2193 1502257 NONE 27-JUN-12
MEASURE_8 21 1502257 NONE 27-JUN-12
MEASURE_9 27263 1502257 NONE 27-JUN-12
I again ran the query
450 rows selected.
Elapsed: 00:00:02.95
Execution Plan
----------------------------------------------------------
Plan hash value: 240271480
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 31M| 718G| 8046 (2)| 00:01:37 |
|* 1 | VIEW | | 31M| 718G| 8046 (2)| 00:01:37 |
| 2 | COUNT | | | | | |
| 3 | NESTED LOOPS | | | | | |
| 4 | NESTED LOOPS | | 31M| 62G| 8046 (2)| 00:01:37 |
| 5 | VIEW | VW_NSO_1 | 4057 | 7931K| 6263 (2)| 00:01:16 |
|* 6 | FILTER | | | | | |
| 7 | HASH GROUP BY | | 1 | 20285 | 6263 (2)| 00:01:16 |
| 8 | TABLE ACCESS FULL | TAB3 | 1502K| 7335K| 6193 (1)| 00:01:15 |
|* 9 | INDEX RANGE SCAN | IND_TAB3_MSR_0 | 4 | | 2 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID| TAB3 | 779K| 75M| 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COL1">=0 AND "COL1"<=449)
6 - filter(COUNT(*)=1)
9 - access("MEASURE_0"="MEASURE_0")
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
660054 consistent gets
22561 physical reads
0 redo size
44358 bytes sent via SQL*Net to client
715 bytes received via SQL*Net from client
31 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
450 rows processed
This time the query came back in 2.9 seconds (sometimes it took 3.9 seconds too).
My goal is to minimize the query execution time as much as possible. But after adding indexes or after computing statistics the query time just kept increasing. Why is this happening and how can I improve even by keeping the indexes?