1

I have an index:

CREATE INDEX BLAH ON EMPLOYEE(SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4));

and an SQL STATEMENT:

SELECT COUNT(*) 
  FROM (SELECT COUNT(*) 
          FROM EMPLOYEE 
         GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4) 
        HAVING COUNT(*) > 100);

but it keeps doing a full table scan instead of using the index unless I add a hint.

EMPSHIRTNO is not the primary key, EMPNO is (which isn't used here).

Complex query

EXPLAIN PLAN FOR SELECT COUNT(*) FROM (SELECT COUNT(*) FROM EMPLOYEE
                                        GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)
                                       HAVING COUNT(*) > 100);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1712471557
----------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |     1 |       |    24   (9)| 00:00:01 |
|   1 |  SORT AGGREGATE       |          |     1 |       |            |          |
|   2 |   VIEW                |          |   497 |       |    24   (9)| 00:00:01 |
|*  3 |    FILTER             |          |       |       |            |          |
----------------------------------------------------------------------------------
|   4 |     HASH GROUP BY     |          |   497 |  2485 |    24   (9)| 00:00:01 |
|   5 |      TABLE ACCESS FULL| EMPLOYEE |  9998 | 49990 |    22   (0)| 00:00:01||
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------                             
   3 - filter(COUNT(*)>100)                                                     

17 rows selected.

ANALYZE INDEX BLAH VALIDATE STRUCTURE;

SELECT BTREE_SPACE, USED_SPACE FROM INDEX_STATS;

BTREE_SPACE USED_SPACE
----------- ----------
     176032     150274

Simple query:

EXPLAIN PLAN FOR SELECT * FROM EMPLOYEE;

PLAN_TABLE_OUTPUT                                                               
--------------------------------------------------------------------------------
Plan hash value: 2913724801                                                     

------------------------------------------------------------------------------  
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |  
------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT  |          |  9998 |   439K|    23   (5)| 00:00:01 |  
|   1 |  TABLE ACCESS FULL| EMPLOYEE |  9998 |   439K|    23   (5)| 00:00:01 |  
------------------------------------------------------------------------------  

8 rows selected.

Maybe it is because the NOT NULL constraint is enforced via a CHECK constraint rather than being defined originally in the table creation statement? It will use the index when I do:

SELECT * FROM EMPLOYEE WHERE SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4) = '1234';

For those suggesting that it needs to read all of the rows anyway (which I don't think it does as it is counting), the index is not used on this either:

SELECT SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4) FROM EMPLOYEE;

In fact, putting an index on EMPSHIRTNO and performing SELECT EMPSHIRTNO FROM EMPLOYEE; does not use the index either. I should point out that EMPSHIRTNO is not unique, there are duplicates in the table.

BobTurbo
  • 141
  • 1
  • 1
  • 9
  • This is not a valid query. It has more opening than closing parentheses. – Codo Oct 03 '11 at 08:04
  • 1
    Please show the two query plans (with and without the hints). And have you updated your database statistics lately? – Codo Oct 03 '11 at 08:06
  • @Codo for some reason, I cannot even get it to use the index with hints anymore. I have updated the statistics. I think the problem is that the query doesn't use a where SUBSTR.... clause and so the database does not want to use the index. – BobTurbo Oct 03 '11 at 10:39
  • Maybe Oracle is smarter than you? (just kidding), but a FTS isn't always an evil thing. In this case it may be faster/better to do a FTS – tbone Oct 03 '11 at 11:35
  • Well it is smarter than the person telling me to do it ;) Maybe they want me to rewrite the query using the where clause.. which I am trying to figure out. – BobTurbo Oct 03 '11 at 11:43
  • A full table scan is bad in this case because all the query needs is contained within the index... – BobTurbo Oct 03 '11 at 11:46
  • Answer: I changed COUNT(*) to COUNT(SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)) so that Oracle could catch on that it doesn't need to do a full table scan. – BobTurbo Oct 03 '11 at 11:51
  • Is empshirtno nullable? If so, then the index won't cover all rows and won't be used for counting... – Lord Peter Oct 03 '11 at 13:00
  • 1
    Oracle most likely cannot determine if the result of the function SUBSTR(TO_CHAR(xx), 1, 4) is NOT NULL. So it assume that it is nullable and the index therefore doesn't contain all rows. So it cannot and does not use it. – Codo Oct 03 '11 at 15:54
  • Is this quite a small table? (Optimiser is less likely to use index if FTS is only a few blocks.) Codo suggested posting query plans which would indicate expected cardinality. – Lord Peter Oct 03 '11 at 18:36
  • I will post some more details in the question. – BobTurbo Oct 04 '11 at 00:24
  • See my answer here: http://stackoverflow.com/questions/7358137/oracle-full-text-search-with-condition/7379753#7379753 – NullUserException Oct 04 '11 at 03:06
  • See this site: http://use-the-index-luke.com/ –  Oct 04 '11 at 13:12

3 Answers3

3

Because of the nature of your query it needs to scan every row of the table anyway. So oracle is probably deciding that a full table scan is the most efficient way to do this. Because its using a HASH GROUP BY there is no nasty sort at the end like in oracle 7 days.

First get the count per SUBSTR(...) of shirt no. Its thus first part of the query which has to scan the entire table

SELECT COUNT(*) 
FROM EMPLOYEE 
GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)

Next you want to discard the SUBSTR(...) where the count is <= 100. Oracle needs to scan all rows to verify this. Technically you could argue that once it has 101 it doesn't need any more, but I don't think Oracle can work this out, especially as you are asking it what the total numer is in the SELECT COUNT(*) of the subquery.

HAVING COUNT(*) > 100);

So basically to give you the answer you want Oracle needs to scan every row in the table, so an index is no help on filtering. Because its using a hash group by, the index is no help on the grouping either. So to use the index would just slow your query down, which is why Oracle is not using it.

Sodved
  • 8,428
  • 2
  • 31
  • 43
0

I think you may need to build a function-based index on SUBSTR(TO_CHAR(EMPSHIRTNO), 1,4); Functions in your SQL have a tendency to invalidate regular indexes on a column.

Bart K
  • 684
  • 5
  • 10
0

I believe @Codo is correct. Oracle cannot determine that the expression will always be non-null, and then must assume that some nulls may not be stored in the index.

(It seems like Oracle should be able to figure out that the expression is not nullable. In general, the chance of any random SUBSTR expression always being not null is probably very low, maybe Oracle just lumps all SUBSTR expressions together?)

You can make the index usable for your query with one of these work-arounds:

--bitmap index:
create bitmap index blah on employee(substr(to_char(empshirtno), 1, 4));
--multi-column index:
alter table employee add constraint blah primary key (id, empshirtno);
--indexed virtual column:
create table employee(empshirtno varchar2(10) not null
    ,empshirtno_for_index as (substr(empshirtno,1,4)) not null );
create index blah on employee(empshirtno_for_index);
Jon Heller
  • 34,999
  • 6
  • 74
  • 132