Oracle database help optimizing LIKE searches

Question

I am on Oracle 11g and we have these 3 core tables:

Customer - CUSTOMERID|DOB
CustomerName - CUSTOMERNAMEID|CustomerID|FNAME|LNAME
Address - ADDRESSID|CUSTOMERID|STREET|CITY|STATE|POSTALCODE

I have about 60 million rows on each of the tables and the data is a mix of US and Canadian population.

I have a front-end application that calls a web service and they do a last name and partial zip search. So my query basically has

where CUSTOMERNAME.LNAME = ? and ADDRESS.POSTALCODE LIKE '?%'

They typically provide the first 3 digits of the zip.

The address table has an index on all street/city/state/zip and another one on state and zip.

I did try adding an index exclusively for the zip and forced oracle to use that index on my query but that didn't make any difference.

For returning about 100 rows (I have pagination to only return 100 at a time) it takes about 30 seconds which isn't ideal. What can I do to make this better?

You need to show your query. My guess is that the tables are joined on CUSTOMERID, so CUSTOMERID s/b indexed on each table. — Brian Leach, Oct 17 '17 at 21:55
The issue is that ZIP codes are all digits (if you are talking about the US) but they are stored as strings (not much of a choice in that, really). Even with an index, without more help Oracle doesn't know that between 099 and 100 it doesn't need to look for 09A and 09W. So its cardinality estimates will be way off. One way to help Oracle is to add histograms. https://docs.oracle.com/database/121/TGSQL/tgsql_histo.htm#TGSQL366 — , Oct 18 '17 at 00:10
Then: LNAME should be much more selective than POSTALCODE, but you didn't mention an index on LNAME. Is there one? — , Oct 18 '17 at 00:12
@mathguy I'm not sure. The average population per zip code is only like 7 or 8 thousand. Compare that with number of Smiths or Johnsons in the U.S. I think he needs a little design change to get LNAME and POSTALCODE materialized in the same table somehow. — Matthew McPeak, Oct 18 '17 at 02:16
Why do you have a separate table for `CustomerName`? I assume `Customer.DOB` is "date of birth"? How can a customer have multiple names but one date of birth? — Matthew McPeak, Oct 18 '17 at 02:24
@MatthewMcPeak - In general we should not draw any conclusion from comparing the average of one thing against the maximum of another. But you are right: even the most populous zip codes only have about 100k residents, but there are over 2 million Smith's in the U.S. So that is a valid point... even with an index on name, a histogram would still help (on LNAME this time, rather than POSTALCODE). — , Oct 18 '17 at 03:30
Thanks all. 1. Yes CustomerID is indexed 2. As for normalization of keeping names and DOBs apart - the database is part of a commercial product we bought. And it suits our business needs - a customer can have multiple names but if they have different DOBs that's a problem and we don't treat them the same. — user3726933, Oct 18 '17 at 14:06
And yes lastname has its own index. Completely agree that a materialized view or putting things together on a single table etc would yield much better results but I was looking for some quick win solution if at all possible. Would it be a good idea to add three new columns in the address table for Zip3 / Zip4 / Zip5 and auto populate them on each insert and use those columns to query on the web service layer based on the length of the zip code passed? — user3726933, Oct 18 '17 at 14:07

fhossfel · Answer 1 · 2017-10-17T23:02:13.510

The problem is that the filters you are applying are not very selective and they apply to different tables. This is bad for an old-fashioned btree index. If the content is very static you could try bitmap indexes. More precisely a function based bitmap join index on the first three letter of the last name and a bitmap join index on the postal code column. This assumes that very few people with the whose last name starts with certain letters live in an are with a certain postal code.

CREATE BITMAP INDEX ix_customer_custname ON customer(SUBSTR(cn.lname,1,3))
FROM customer c, customername cn
WHERE c.customerid = cn.customerid;

CREATE BITMAP INDEX ix_customer_postalcode ON customer(SUBSTR(a.postalcode,1,3))
FROM customer c, address a
WHERE c.customerid = a.customerid;

If you are successful you should see the two bitmap indexes becoming AND connected. The execution time should drop to a couple of seconds. It will not be as fast as a btree index.

Remarks:

You may have to play around a bit whether it is more efficient to make one or two indexes and whether the function are helpful useful.
If you decide to do it function based you should include the exact same function calls in the where clause of your query. Otherwise the index will not be used.
DML operations will be considerably slower. This is only useful for tables with static data. Note that DML operations will block whole row "ranges". Concurrent DML operations will run into problems.
Response time will probably still be seconds not instanteously like a BTREE index.
AFAIK this will work only on the enterprise edition. The syntax is untested because I do not have an enterprise db available at the moment.
If this is still not fast enough you can create a materialized view with customerid, last name and postal code and but a btree index on it. But that is kind of expensive, too.

I would recommend against a `BITMAP` index if there is any level of concurrent writing to the tables. — Matthew McPeak, Oct 17 '17 at 22:56
I agree 100%. Once you have bitmap indexes on your table you do not have row locking anymore. Running parallel DML operations on that is asking for trouble. — fhossfel, Oct 17 '17 at 23:00
Thanks. Yes there will be concurrent writing. But BITMAP index sounds like a good idea for another use case we have. — user3726933, Oct 18 '17 at 14:10

Oracle database help optimizing LIKE searches

1 Answers1