This is going to be long, so here's a quick summary to draw you in: my
top-N query with COUNT STOPKEY
and ORDER BY STOPKEY
in its plan is
still slow for no good reason.
Now, the details. It starts with a slow function. In real life it involves string manipulations with regexps. For demonstration purposes, here's an intentionally stupid recursive Fibonacci algorithm. I find it to be pretty fast for inputs up to about 25, slow around 30, and ridiculous at 35.
-- I repeat: Please no advice on how to do Fibonacci correctly.
-- This is slow on purpose!
CREATE OR REPLACE FUNCTION tmp_fib (
n INTEGER
)
RETURN INTEGER
AS
BEGIN
IF n = 0 OR n = 1 THEN
RETURN 1;
END IF;
RETURN tmp_fib(n-2) + tmp_fib(n-1);
END;
/
Now some input: a list of names and numbers.
CREATE TABLE tmp_table (
name VARCHAR2(20) UNIQUE NOT NULL,
num NUMBER(2,0)
);
INSERT INTO tmp_table (name,num)
SELECT 'Alpha', 10 FROM dual UNION ALL
SELECT 'Bravo', 11 FROM dual UNION ALL
SELECT 'Charlie', 33 FROM dual;
Here's an example of a slow query: use the slow Fibonacci function to select rows whose num generates a Fibonacci number with a doubled digit.
SELECT p.name, p.num
FROM tmp_table p
WHERE REGEXP_LIKE(tmp_fib(p.num), '(.)\1')
ORDER BY p.name;
This is true for 11 and 33, so Bravo
and Charlie
are in the output.
It takes about 5 seconds to run, almost all of which is the slow
calculation of tmp_fib(33)
. So I want to do a faster version of the
slow query by converting it to a top-N query. With N=1, it looks like
this:
SELECT * FROM (
SELECT p.name, p.num
FROM tmp_table p
WHERE REGEXP_LIKE(tmp_fib(p.num), '(.)\1')
ORDER BY p.name
)
WHERE ROWNUM <= 1;
And now it returns the top result, Bravo
. But it still takes 5 seconds
to run! The only explanation is that it's still calculating
tmp_fib(33)
, even though the result of that calculation is irrelevant
to the result. It should have been able to decide that Bravo
was going
to be output, so there's no need to test the WHERE condition for the
rest of the table.
I've thought that maybe the optimizer just needs to be told that
tmp_fib
is expensive. So I tried to tell it that, like this:
ASSOCIATE STATISTICS WITH FUNCTIONS tmp_fib DEFAULT COST (999999999,0,0);
That alters some of the cost numbers in the plan, but it doesn't make the query run faster.
Output of SELECT * FROM v$version
in case this is version-dependent:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE 11.2.0.2.0 Production
TNS for 64-bit Windows: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production
And here's the autotrace of the top-1 query. It appears to be claiming that the query took 1 second, but that's not true. It ran for about 5 seconds.
NAME NUM
-------------------- ----------
Bravo 11
Execution Plan
----------------------------------------------------------
Plan hash value: 548796432
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 4 (25)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 1 | 55 | 4 (25)| 00:00:01 |
|* 3 | SORT ORDER BY STOPKEY| | 1 | 55 | 4 (25)| 00:00:01 |
|* 4 | TABLE ACCESS FULL | TMP_TABLE | 1 | 55 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=1)
3 - filter(ROWNUM<=1)
4 - filter( REGEXP_LIKE (TO_CHAR("TMP_FIB"("P"."NUM")),'(.)\1'))
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
27 recursive calls
0 db block gets
25 consistent gets
0 physical reads
0 redo size
593 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed
UPdATE: As I mentioned in the comments, an INDEX
hint helps this query a lot. It would be good enough to be accepted as the correct answer, even though it doesn't translate well to my real-world scenario. And in an ironic twist, Oracle seems to have learned from the experience, and now chooses the INDEX
plan by default; I have to tell it NO_INDEX
to reproduce the original slow behavior.
In the real-world scenario I've applied a more complex solution, rewriting the query as a PL/SQL function. Here's how my technique looks, applied to the fib
problem:
CREATE OR REPLACE PACKAGE tmp_package IS
TYPE t_namenum IS TABLE OF tmp_table%ROWTYPE;
FUNCTION get_interesting_names (howmany INTEGER) RETURN t_namenum PIPELINED;
END;
/
CREATE OR REPLACE PACKAGE BODY tmp_package IS
FUNCTION get_interesting_names (howmany INTEGER) RETURN t_namenum PIPELINED IS
CURSOR c IS SELECT name, num FROM tmp_table ORDER BY name;
rec c%ROWTYPE;
outcount INTEGER;
BEGIN
OPEN c;
outcount := 0;
WHILE outcount < howmany LOOP
FETCH c INTO rec;
EXIT WHEN c%NOTFOUND;
IF REGEXP_LIKE(tmp_fib(rec.num), '(.)\1') THEN
PIPE ROW(rec);
outcount := outcount + 1;
END IF;
END LOOP;
END;
END;
/
SELECT * FROM TABLE(tmp_package.get_interesting_names(1));
Thanks to the responders who read the question and ran the tests and helped me understand the execution plans, and I will dispose of this question however they suggest.