13

How should I index a date column when some rows has null values? We have to select rows between a date range and rows with null dates.

We use Oracle 9.2 and higher.

Options I found

  1. Using a bitmap index on the date column
  2. Using an index on date column and an index on a state field which value is 1 when the date is null
  3. Using an index on date column and an other granted not null column

My thoughts to the options are:

to 1: the table have to many different values to use an bitmap index
to 2: I have to add an field only for this purpose and to change the query when I want to retrieve the null date rows
to 3: locks tricky to add an field to an index which is not really needed

What is the best practice for this case? Thanks in advance

Some infos I have read:

Oracle Date Index
When does Oracle index null column values?

Edit

Our table has 300,000 records. 1,000 to 10,000 records are inserted and delete every day. 280,000 records have a null delivered_at date. It is a kind of picking buffer.

Our structure (translated to english) is:

create table orders
(
  orderid              VARCHAR2(6) not null,
  customerid           VARCHAR2(6) not null,
  compartment          VARCHAR2(8),
  externalstorage      NUMBER(1) default 0 not null,
  created_at           DATE not null,
  last_update          DATE not null,
  latest_delivery      DATE not null,
  delivered_at         DATE,
  delivery_group       VARCHAR2(9),
  fast_order           NUMBER(1) default 0 not null,
  order_type           NUMBER(1) default 0 not null,
  produkt_group        VARCHAR2(30)
)
Community
  • 1
  • 1
Heinz Z.
  • 1,537
  • 3
  • 14
  • 29
  • 1
    hi, how much bytes is 1 row and how many rows / what will be grow rate ? – Alexander Malakhov Jun 18 '10 at 10:25
  • 3
    After the edit: If 280,000 rows out of 300,000 rows contain null values and you want to select them all, why do you want to index the null values? Why do you want to read a book by the index if you are going to read the entire book anyway? – Rob van Wijk Jun 18 '10 at 11:27
  • @Alexander: Your question was important. Because of it I analyse the data which I should really do first. – Heinz Z. Jun 19 '10 at 04:09

4 Answers4

15

In addition to Tony's excellent advice, there is also an option to index your column in such a way that you don't need to adjust your queries. The trick is to add a constant value to just your index.

A demonstration:

Create a table with 10,000 rows out of which only 6 contain a NULL value for the a_date column.

SQL> create table mytable (id,a_date,filler)
  2  as
  3   select level
  4        , case when level < 9995 then date '1999-12-31' + level end
  5        , lpad('*',1000,'*')
  6     from dual
  7  connect by level <= 10000
  8  /

Table created.

First I'll show that if you just create an index on the a_date column, the index is not used when you use the predicate "where a_date is null":

SQL> create index i1 on mytable (a_date)
  2  /

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'mytable',cascade=>true)

PL/SQL procedure successfully completed.

SQL> set autotrace on
SQL> select id
  2       , a_date
  3    from mytable
  4   where a_date is null
  5  /

        ID A_DATE
---------- -------------------
      9995
      9996
      9997
      9998
      9999
     10000

6 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=72 Card=6 Bytes=72)
   1    0   TABLE ACCESS (FULL) OF 'MYTABLE' (Cost=72 Card=6 Bytes=72)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        720  consistent gets
          0  physical reads
          0  redo size
        285  bytes sent via SQL*Net to client
        234  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          6  rows processed

720 consistent gets and a full table scan.

Now change the index to include the constant 1, and repeat the test:

SQL> set autotrace off
SQL> drop index i1
  2  /

Index dropped.

SQL> create index i1 on mytable (a_date,1)
  2  /

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'mytable',cascade=>true)

PL/SQL procedure successfully completed.

SQL> set autotrace on
SQL> select id
  2       , a_date
  3    from mytable
  4   where a_date is null
  5  /

        ID A_DATE
---------- -------------------
      9995
      9996
      9997
      9998
      9999
     10000

6 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=6 Bytes=72)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'MYTABLE' (Cost=2 Card=6 Bytes=72)
   2    1     INDEX (RANGE SCAN) OF 'I1' (NON-UNIQUE) (Cost=2 Card=6)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
        285  bytes sent via SQL*Net to client
        234  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          6  rows processed

6 consistent gets and an index range scan.

Regards, Rob.

Rob van Wijk
  • 17,555
  • 5
  • 39
  • 55
  • 1
    B-tree index entries only contain the non-null values. So when you have an index on just the a_date column, the index contains 9994 entries. Adding a constant makes all 10,000 rows appear in the index and thereby suitable to be used for the query. – Rob van Wijk May 13 '16 at 07:48
12

"Our table has 300,000 records.... 280,000 records have a null delivered_at date. "

In other words almost the entire table satisfies a query which searches on where DELIVERED_AT is null. An index is completely inappropriate for that search. A full table scan is much the best approach.

If you have an Enterprise Edition license and you have the CPUs to spare, using a parallel query would reduce the elapsed time.

APC
  • 144,005
  • 19
  • 170
  • 281
  • After all you absolutly right (like Rob in his comment). I vote the other answeres up. They can help in other situations and at first there was no information about the data. – Heinz Z. Jun 19 '10 at 04:12
  • @HeinzZ - That is an extremely pertinent point. Databases are all about the data. So when it comes to performance tuning volumes and distribution - especially skewed distribution - are key pieces of information. – APC Jun 19 '10 at 06:25
9

Do you mean that your queries will be like this?

select ...
from mytable
where (datecol between :from and :to
       or datecol is null);

It would only be worth indexing the nulls if they were relatively few in the table - otherwise a full table scan may be the most efficient way to find them. Assuming it is worth indexing them you could create a function-based index like this:

create index mytable_fbi on mytable (case when datecol is null then 1 end);

Then change your query to:

select ...
from mytable
where (datecol between :from and :to
       or case when datecol is null then 1 end = 1);

You could wrap the case in a function to make it slicker:

create or replace function isnull (p_date date) return varchar2
DETERMINISTIC
is
begin
    return case when p_date is null then 'Y' end;
end;
/

create index mytable_fbi on mytable (isnull(datecol));

select ...
from mytable
where (datecol between :from and :to
       or isnull(datecol) = 'Y');

I made sure the function returns NULL when the date is not null so that only the null dates are stored in the index. Also I had to declare the function as DETERMINISTIC. (I changed it to return 'Y' instead of 1 merely because to me the name "isnull" suggests it should; feel free to ignore my preference!)

Tony Andrews
  • 129,880
  • 21
  • 220
  • 259
0

Avoid the table lookup and create the index like this :

create index i1 on mytable (a_date,id) ;
user1317221_G
  • 15,087
  • 3
  • 52
  • 78