top-N query doing too much work in spite of STOPKEY optimization

Question

This is going to be long, so here's a quick summary to draw you in: my top-N query with COUNT STOPKEY and ORDER BY STOPKEY in its plan is still slow for no good reason.

Now, the details. It starts with a slow function. In real life it involves string manipulations with regexps. For demonstration purposes, here's an intentionally stupid recursive Fibonacci algorithm. I find it to be pretty fast for inputs up to about 25, slow around 30, and ridiculous at 35.

-- I repeat: Please no advice on how to do Fibonacci correctly.
-- This is slow on purpose!
CREATE OR REPLACE FUNCTION tmp_fib (
  n INTEGER
)
  RETURN INTEGER
AS
BEGIN
  IF n = 0 OR n = 1 THEN
    RETURN 1;
  END IF;
  RETURN tmp_fib(n-2) + tmp_fib(n-1);
END;
/

Now some input: a list of names and numbers.

CREATE TABLE tmp_table (
  name VARCHAR2(20) UNIQUE NOT NULL,
  num NUMBER(2,0)
);
INSERT INTO tmp_table (name,num)
  SELECT 'Alpha',    10 FROM dual UNION ALL
  SELECT 'Bravo',    11 FROM dual UNION ALL
  SELECT 'Charlie',  33 FROM dual;

Here's an example of a slow query: use the slow Fibonacci function to select rows whose num generates a Fibonacci number with a doubled digit.

SELECT p.name, p.num
FROM tmp_table p
WHERE REGEXP_LIKE(tmp_fib(p.num), '(.)\1')
ORDER BY p.name;

This is true for 11 and 33, so Bravo and Charlie are in the output. It takes about 5 seconds to run, almost all of which is the slow calculation of tmp_fib(33). So I want to do a faster version of the slow query by converting it to a top-N query. With N=1, it looks like this:

SELECT * FROM (
  SELECT p.name, p.num
  FROM tmp_table p
  WHERE REGEXP_LIKE(tmp_fib(p.num), '(.)\1')
  ORDER BY p.name
)
WHERE ROWNUM <= 1;

And now it returns the top result, Bravo. But it still takes 5 seconds to run! The only explanation is that it's still calculating tmp_fib(33), even though the result of that calculation is irrelevant to the result. It should have been able to decide that Bravo was going to be output, so there's no need to test the WHERE condition for the rest of the table.

I've thought that maybe the optimizer just needs to be told that tmp_fib is expensive. So I tried to tell it that, like this:

ASSOCIATE STATISTICS WITH FUNCTIONS tmp_fib DEFAULT COST (999999999,0,0);

That alters some of the cost numbers in the plan, but it doesn't make the query run faster.

Output of SELECT * FROM v$version in case this is version-dependent:

Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for 64-bit Windows: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

And here's the autotrace of the top-1 query. It appears to be claiming that the query took 1 second, but that's not true. It ran for about 5 seconds.

NAME                        NUM
-------------------- ----------
Bravo                        11


Execution Plan
----------------------------------------------------------
Plan hash value: 548796432

-------------------------------------------------------------------------------------
| Id  | Operation               | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |           |     1 |    55 |     4  (25)| 00:00:01 |
|*  1 |  COUNT STOPKEY          |           |       |       |            |          |
|   2 |   VIEW                  |           |     1 |    55 |     4  (25)| 00:00:01 |
|*  3 |    SORT ORDER BY STOPKEY|           |     1 |    55 |     4  (25)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL   | TMP_TABLE |     1 |    55 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1)
   3 - filter(ROWNUM<=1)
   4 - filter( REGEXP_LIKE (TO_CHAR("TMP_FIB"("P"."NUM")),'(.)\1'))

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
         27  recursive calls
          0  db block gets
         25  consistent gets
          0  physical reads
          0  redo size
        593  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

UPdATE: As I mentioned in the comments, an INDEX hint helps this query a lot. It would be good enough to be accepted as the correct answer, even though it doesn't translate well to my real-world scenario. And in an ironic twist, Oracle seems to have learned from the experience, and now chooses the INDEX plan by default; I have to tell it NO_INDEX to reproduce the original slow behavior.

In the real-world scenario I've applied a more complex solution, rewriting the query as a PL/SQL function. Here's how my technique looks, applied to the fib problem:

CREATE OR REPLACE PACKAGE tmp_package IS
  TYPE t_namenum IS TABLE OF tmp_table%ROWTYPE;
  FUNCTION get_interesting_names (howmany INTEGER) RETURN t_namenum PIPELINED;
END;
/

CREATE OR REPLACE PACKAGE BODY tmp_package IS
  FUNCTION get_interesting_names (howmany INTEGER) RETURN t_namenum PIPELINED IS
    CURSOR c IS SELECT name, num FROM tmp_table ORDER BY name;
    rec c%ROWTYPE;
    outcount INTEGER;
  BEGIN
    OPEN c;
    outcount := 0;
    WHILE outcount < howmany LOOP
      FETCH c INTO rec;
      EXIT WHEN c%NOTFOUND;
      IF REGEXP_LIKE(tmp_fib(rec.num), '(.)\1') THEN
        PIPE ROW(rec);
        outcount := outcount + 1;
      END IF;
    END LOOP;
  END;
END;
/

SELECT * FROM TABLE(tmp_package.get_interesting_names(1));

Thanks to the responders who read the question and ran the tests and helped me understand the execution plans, and I will dispose of this question however they suggest.

Notice where the `filter` is occurring - _before_ the sort. You can't assume that `11` is the first row that will be checked for validity. — Mat, May 21 '13 at 21:39
@Mat so this is my question then: how do I convince oracle to do the filter after the sort? — , May 21 '13 at 21:42
Without evaluating the `where` clause for every row in the table, how would it know that `Bravo` isn't going to be filtered out? — Alex Poole, May 21 '13 at 21:42
Can't try it right now, but you'd need to select all from your table ordered by num asc as an inner query, then filter that with your where, then restrict that with rownum to get a chance at testing the smaller values first. — Mat, May 21 '13 at 21:44
@AlexPoole if the sort was done first, the filter function would be called on Alpha first, then Bravo, then the stopkey would be hit and no more filter tests would be needed. — , May 21 '13 at 21:46
@Mat the "smaller values are faster" thing is feature of my demo function, not the real one. The real one isn't so predictable. — , May 21 '13 at 21:47
Well, yeah, reality always gets in the way. Can't really help you with your real problem if what you have above isn't representative though... (But if the ordering by name would work, try that, in an inner query.) — Mat, May 21 '13 at 21:49
It would take a book-length post to explain all the limitations of the reality behind this question. But think of "Charlie" as a placeholder for a million rows, each of which takes a small but noticeable amount of time to test. The only reasonable approach is to sort first, then filter. Can Oracle do that, or not? I'm currently trying out the alternative query with the inner query separated into a sorting part and a filtering part — , May 21 '13 at 21:57
But sort by what? You seem to want to preempt the function call result. Oddly when I run this under 11.2.0.3 I get a `SORT ORDER BY` rather than a `SORT ORDER BY STOPKEY` and the query drops from 7 seconds (for all three rows) to 0.1 seconds (`Bravo` only). Putting in another sub-select to order before the `where` clause reverts it back to what you're seeing. — Alex Poole, May 21 '13 at 22:01
@AlexPoole I'm sorting by name, which is a plain old `VARCHAR2` field. The function call result is not needed for the sort. Delaying the function call until after the sort is exactly what I need. Very interesting that you got a different result than I did though! Did yours still have a `FULL` table access? — , May 21 '13 at 22:10

Alex Poole · Answer 1 · 2013-05-22T07:33:53.183

Follow-up to comment as this is too big. Running under 11.2.0.3 (OEL), your query:

SELECT * FROM (
  SELECT p.name, p.num
  FROM tmp_table p
  WHERE REGEXP_LIKE(tmp_fib(p.num), '(.)\1')
  ORDER BY p.name
)
WHERE ROWNUM <= 1;

NAME                        NUM
-------------------- ----------
Bravo                        11 

Elapsed: 00:00:00.094
Plan hash value: 1058933870

----------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |     1 |    25 |     4  (25)| 00:00:01 |
|*  1 |  COUNT STOPKEY       |           |       |       |            |          |
|*  2 |   VIEW               |           |     3 |    75 |     4  (25)| 00:00:01 |
|   3 |    SORT ORDER BY     |           |     3 |    75 |     4  (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| TMP_TABLE |     3 |    75 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1)
   2 - filter( REGEXP_LIKE (TO_CHAR("TMP_FIB"("NUM")),'(.)\1'))

Note
-----
   - dynamic sampling used for this statement (level=2)

Note the variation in the SORT ORDER BY from what you saw, and the corresponding rows values. Moving the order-by into a sub-select looks more like yours:

SELECT * FROM (
  SELECT * FROM (
    SELECT p.name, p.num
    FROM tmp_table p
    ORDER BY p.name
  )
  WHERE REGEXP_LIKE(tmp_fib(num), '(.)\1')
)
WHERE ROWNUM <= 1;

NAME                        NUM
-------------------- ----------
Bravo                        11 

Elapsed: 00:00:07.894
Plan hash value: 548796432

-------------------------------------------------------------------------------------
| Id  | Operation               | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |           |     1 |    25 |   171  (99)| 00:00:03 |
|*  1 |  COUNT STOPKEY          |           |       |       |            |          |
|   2 |   VIEW                  |           |     1 |    25 |   171  (99)| 00:00:03 |
|*  3 |    SORT ORDER BY STOPKEY|           |     1 |    25 |   171  (99)| 00:00:03 |
|*  4 |     TABLE ACCESS FULL   | TMP_TABLE |     1 |    25 |   170  (99)| 00:00:03 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1)
   3 - filter(ROWNUM<=1)
   4 - filter( REGEXP_LIKE (TO_CHAR("TMP_FIB"("P"."NUM")),'(.)\1'))

Note
-----
   - dynamic sampling used for this statement (level=2)

Don't know how helpful or practical this would be in your real scenario, but in this case (in my environment, anyway), adding an index across all fetched columns - to get a full index scan instead of a full table scan - seems to change the behaviour:

CREATE INDEX tmp_index ON tmp_table(name, num);

index TMP_INDEX created.

SELECT * FROM (
  SELECT p.name, p.num
  FROM tmp_table p
  WHERE REGEXP_LIKE(tmp_fib(p.num), '(.)\1')
  ORDER BY p.name
)
WHERE ROWNUM <= 1;

NAME                        NUM
-------------------- ----------
Bravo                        11 

Elapsed: 00:00:00.093
Plan hash value: 1841475998

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     1 |    25 |     1   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY    |           |       |       |            |          |
|*  2 |   VIEW            |           |     3 |    75 |     1   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN| TMP_INDEX |     3 |    75 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1)
   2 - filter( REGEXP_LIKE (TO_CHAR("TMP_FIB"("NUM")),'(.)\1'))

Note
-----
   - dynamic sampling used for this statement (level=2)

SELECT * FROM (
  SELECT * FROM (
    SELECT p.name, p.num
    FROM tmp_table p
    ORDER BY p.name
  )
  WHERE REGEXP_LIKE(tmp_fib(num), '(.)\1')
)
WHERE ROWNUM <= 1;

NAME                        NUM
-------------------- ----------
Bravo                        11 

Elapsed: 00:00:00.093
Plan hash value: 1841475998

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     1 |    25 |     1   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY    |           |       |       |            |          |
|   2 |   VIEW            |           |     1 |    25 |     1   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN| TMP_INDEX |     1 |    25 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1)
   3 - filter( REGEXP_LIKE (TO_CHAR("TMP_FIB"("P"."NUM")),'(.)\1'))

Note
-----
   - dynamic sampling used for this statement (level=2)

Incidentally, aftr I've run this threw several times with any of the rownum variants I eventually start getting ORA-01000: maximum open cursors exceeded errors. I'm dropping the objects at the end of each run but staying connected. I think that suggests another bug somewhere, though probably not related to what you're seeing since it happens even with the index scan.

Looks like you're getting the optimization that I want, the filter is moved up above the sort. I wonder if it's a difference between 11.2.0.2 and 11.2.0.3 — , May 21 '13 at 22:28
Quite possibly an optimizer tweak in the patchset, yes. I was hoping I could think of a hint that would make my query behave like yours in my env, and thus might make yours behave how you want, but haven't come up with anything yet. — Alex Poole, May 21 '13 at 22:33
I suspect that you'd want to enforce this behaviour with a NO_PUSH_PRED optimizer hint. — David Aldridge, May 22 '13 at 07:39
@DavidAldridge - I tried that on the original queries (without the extra index) and I couldn't make it have any effect, possibly through incorrect use or because there's no join for it to act on. I'm just throwing out ideas for the OP to consider though, I don't have any confidence about what will work/be stable for the real situation (or even this test, TBH). — Alex Poole, May 22 '13 at 08:21
@AlexPoole Sorry I meant in particular for the version with two nested in-line views -- the second code block. — David Aldridge, May 22 '13 at 09:01
Trying out some of the things mentioned here, I found that I can force an index scan on the name field by using a hint `/*+ INDEX(p (name)) */` and get a plan with no `SORT` which runs fast. I can't find any variation of `PUSH_PRED` or `NO_PUSH_PRED` that has any effect. — , May 22 '13 at 20:30

score 1 · Answer 2 · answered Jun 03 '13 at 21:34

Interest apparently died out, so I'm just going to summarize the possible solutions in a self-answer.

upgrade - newer Oracle seems to optimize this type of query better.
Use an INDEX hint to make the inner query retrieve the rows in already-sorted order, which enables the STOPKEY to work properly.
rewrite in PL/SQL, with the inner query as a cursor. fetch from the cursor until you get enough matches, then close it.

top-N query doing too much work in spite of STOPKEY optimization

2 Answers2