2

I'm testing out an Ingres Vectorwise database to try to find a faster way to run queries as we've been experiencing some slowness with relation to text lookup.

Implementation #1 below is very fast if I'm looking up about 5000 items based on a char() column. Unfortunately, it takes about the same length of time to look up 50 items as it does to look up 5000.

Implementation #2 on the other hand is very fast for 50 items, but does not scale at all for 5000 items.

Implementation #3 is obviously going to be worse than #1, but i was just providing examples of what I've tried.

My question is: Assuming my table only has two columns, q = bigint, r = char() field, can you think of any other way to write this query using basic SQL so that I can have more alternatives to choose from? I'm hoping there's an option that performs reasonably for 50 as well as 5000 (one that scales like you would expect in a database).

Note that I'll accept any answer with alternative queries that perform the same function; the more the better. I don't expect any will scale in the way I'm hoping, but I won't know till I try more.


Implementation #1:

select q
from test_table
where r in ('a', 'b', 'c', 'd', 'e')

Implementation #2:

select q
from test_table
where r = 'a' or r = 'b' or r = 'c' or r ='d' or r = 'e'

Implementation #3:

select q
from test_table a
where exists (
    select r
    from testtable
    where r in ('a', 'b', 'c', 'd', 'e')
    and a.r = r)
John Humphreys
  • 37,047
  • 37
  • 155
  • 255

2 Answers2

2

Vectorwise does not "scale like you would expect" because it has no B-tree indexes. It uses per block meta data of high and low values to choose which blocks to read instead, as do most columnar analytic DBMSs. This gives excellent performance when pulling out many rows from a very large table but poor performance pulling out a few rows as it will plough through many unneeded rows.

You can optimise by sorting your table on 'r', either manually or by using the "create index" command (this will prevent bulk loading after the first load). This will make the meta data more precise so fewer blocks will be read.

I note that there are only two columns, no joins and no aggregate functions in the example. Is a columnar database really what you need or would Lucene be more appropriate?

amk
  • 81
  • 2
  • I appreciate the input, and you're correct on all accounts. We have a need for high speed analytic processing which is why we're using Vectorwise. Unfortunately, VW is poor with text comparison, so we need to map text keys to integral keys in any cases where they were used which is why we have these two column tables. This is not the average case, but we have to handle our side cases as well. Ordering/indexing the table helped quite a bit but we're still not on the numbers we're looking for. +1 and thanks for the help :) – John Humphreys Mar 01 '13 at 16:25
1

I can only suggest a union/union all in Implementation #2 as union may be faster then OR. Implementation #1 is what I like more. It should use indexes and should be fast enough. Starting with Oracle 10g, for example, it will automatically rewrite IN subquery to use the EXISTS.

select q
from test_table
where r = 'a' 
UNION ALL
select q
from test_table
where r = 'b' 
....

UNION operator selects distinct rows. UNION ALL selects all rows including duplicates. UNION ALL is usually faster then UNION.

Art
  • 5,616
  • 1
  • 20
  • 22
  • Can you give an example of what you mean by union/union-all in this case? :) – John Humphreys Mar 01 '13 at 15:30
  • +1, will probably accept his later since its the only viable alternate query. It worked about as well as "or" did speed wise, but it doesn't scale either so can't use it; you get a error from the DB after you send a few hundred of the unions at once. – John Humphreys Mar 01 '13 at 17:09
  • w00te-The acceptance is not important for me. I'm doing this for myself, trying to learn new things, remember old... Not sure I understand about sending a few hundreds of unions to Oracle... The OR operator is equiv. to Union, it is the same thing, but faster. Even if you do not feel or see it faster. If using OR then use the parentheses where ( (r = 'a') or (r = 'b') or (r = 'c') or (r ='d')...)... – Art Mar 04 '13 at 13:25
  • It does seem to be the same thing roughly in this database (at least from a query time point of view). Why would adding parenthesis help? – John Humphreys Mar 04 '13 at 13:45
  • In your case the parentheses are optional and for readability mostly. The expression inside parentheses evaluates first. Please check this for general overview - combining OR and AND operators etc...: http://stackoverflow.com/questions/10034489/in-sql-what-does-using-parentheses-with-an-or-mean – Art Mar 04 '13 at 14:28