3

You have a column foo, of some string type, with a index on that column. You want to SELECT from the table WHERE the foo column has the prefix 'pre'. Obviously, the index should be able to help here.

Here is the most obvious way to search by prefix:

SELECT * FROM tab WHERE foo LIKE 'pre%';

Unfortunately, this does not get optimized to use the index (in Oracle or Postgres, at least).

The following, however, does work:

SELECT * FROM tab WHERE 'pre' <= foo AND foo < 'prf';

But are there better ways to accomplish this, or are there ways of making the above more elegant? In particular:

  • I need a function from 'pre' to 'prf', but this has to work for any underlying collation. Also, it's more complicated than above, because if searching for e.g. 'prz' then the upper bound would have to be 'psa', and so on.
  • Can I abstract this into a stored function/procedure and still hit the index? So I could write something like ... WHERE prefix('pre', foo);?

Answers for all DBMSes appreciated.

jameshfisher
  • 34,029
  • 31
  • 121
  • 167
  • What? Unfortunately, this 'this does get optimized to use the index'? Was so bad about it? Of course, a 'begins with' search uses an index, if possible. Thanks god it does. .... Did I misunderstand you? – alzaimar Mar 23 '13 at 22:20
  • lol, sorry, I accidentally the word "not"! edited – jameshfisher Mar 23 '13 at 22:33
  • Oh and do you mean there is a "begins with" construct in SQL? I can't find anything about that. – jameshfisher Mar 23 '13 at 22:36
  • Yes, there is `Where Foo like 'bar%'` will enable an index. – alzaimar Mar 23 '13 at 23:08
  • It's very strange that `LIKE 'pre%'` can't use an index in those DBMSes. It is sargable in SQL Server, at least. – ErikE Mar 23 '13 at 23:15
  • @alzaimar, what DBMS are you using? On Postgres, running `create table tab (foo text not null); create index tab_ix_foo on tab(foo); explain select * from tab where foo like 'pre%';` tells me it's going to do a sequential scan. – jameshfisher Mar 23 '13 at 23:20
  • I checked that on Oracle and MSSQL. If am not too debil, even SQLite followed my instinct. And it's so obvious. – alzaimar Mar 23 '13 at 23:25
  • @ErikE, interesting. I do have a further reason to not use `LIKE`: when using a stored procedure with the prefix as a parameter, the optimizer won't be able to see that it's always a prefix search. (Presumably SQL Server only performs that optimization when given a string literal.) – jameshfisher Mar 23 '13 at 23:28
  • @alzaimar, ooh, interesting. I've found that Postgres *does* optimize it, but only for certain collations. If I create the column and index to use collation `"POSIX"`, it optimizes. If i use `"en_US"`, it doesn't. This worries me because I don't think the prefix-search problem should be affected by collation. – jameshfisher Mar 23 '13 at 23:48
  • @eegg That's not entirely true. There is "parameter sniffing". – ErikE Mar 24 '13 at 02:05

2 Answers2

0

The database is quite important here. It so happens that SQL Server does this optimization for like.

One way is to do something like this:

where foo >= 'pre' and foo <= 'pre+'~'

'~' has the largest 7-bit ASCII value of a printable character, so it is basically bigger than anything else. This however, may be a problem if you are using wide characters or a non-standard character set.

You cannot abstract this into a function, because use of a function generally precludes the use of indexes. If you are always looking at the first three characters, then in Oracle you can create an index on those three characters (something called a "function-based index").

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I don't think `foo <= 'pre~'` works, because it cuts out values like `'pre~fix'`. If using a `<=` comparison, the value would have to be `pre~~~~~~~~~...` carrying on forever. Use of functions: I think you're right. Perhaps macros in SQL would be a nice thing, but that's pie-in-the-sky. – jameshfisher Mar 23 '13 at 23:11
-3

How about

select * from tab where foo between 'pre' and 'prf' and foo != 'prf'

this enables the index same way. The RDBMS must be pretty dumb not to use an index for that.

alzaimar
  • 4,572
  • 1
  • 16
  • 30
  • You use inclusive end range, then have to specify another condition to make the end exclusive? Why not `foo >= 'pre' AND foo < 'prf'`? – ErikE Mar 23 '13 at 23:13
  • Nice question. I haven't tried it, but from my guts it seems as if `between` is better. Hmm... Am I too much in belief where I should be more into proof? – alzaimar Mar 23 '13 at 23:24
  • Is `x BETWEEN l AND h` not just sugar for `l <= x AND x <= h`? – jameshfisher Mar 23 '13 at 23:49
  • 1
    There's no way that `BETWEEN` can be better. As @eegg said, all it is is syntactic sugar expanded to two <= expressions, just like `IN` expands to `OR` expressions. You wouldn't say `'pre' <= foo AND ' foo <= 'pre' AND foo <> 'pre'`, would you? So it's "wrong" to do it with `BETWEEN`. Perhaps in the future more testing/proof is better than "instinct". :) – ErikE Mar 24 '13 at 02:06