0

We've got a system where we've got a collection of items (> 1 million), and several things processing it. Each processor should only process each item once, and the processors have a heirarchy.

Our current implementation is to have a 'processed' table keeping track of what each processor has already done:

CREATE TABLE items (id NUMBER PRIMARY KEY, ...)
CREATE TABLE itemsProcessed(
    item NUMBER REFERENCES items(id),
    processor NUMBER)

our query is this (itemsProcessed has got relevant indexes on it) - we're using a NOT IN to filter out items that have already been processed by the current processor or it's ancestors:

SELECT ... FROM items i WHERE <additional queries on items>
    AND id NOT IN (SELECT item FROM itemsProcessed WHERE processor IN (1, 2))

When the processed table gets very big, this query starts taking a long time (several seconds), as it has to do a lot of filtering before it starts returning the first item (the query plan is using a hash antijoin)

We need this query to return the first few items very quickly - ideally return the first items under 500ms. This means it can't iterate over items and filter out those in itemsProcessed. So we need some way of doing a negation index on the join of items and itemsProcessed (we've accomplished this on mongo, but oracle doens't seem to be able to do a similar thing)

Is this possible with Oracle?

thecoop
  • 45,220
  • 19
  • 132
  • 189
  • Could you try if this gives better results ? `AND id IN (SELECT item FROM itemsProcessed WHERE processor > 2)`. If your `processor` is never `NULL` it should return the same results but expressed without `NOT` which might better use the indexes in some cases – Thomas G Sep 28 '16 at 11:23
  • Is your foreign key indexed - is that a ''relevant" index you say you have, or have you only indexed `processor`? What kind of indexes are they? What does the execution plan show? Have you tried using `not exists` instead? And using a `rownum` stop key if you only want the first few unprocessed rows? – Alex Poole Sep 28 '16 at 11:24
  • The time query takes, also depends on what you are doing in . Are you using indexes on the table? – AL̲̳I Sep 28 '16 at 11:25
  • All the other WHERE clauses are indexed, and oracle is using the index to filter. The thing that is taking the time is the antijoin churning through lots of already-processed items – thecoop Sep 28 '16 at 11:28
  • @AlexPoole the rownum filter doesn't help - it has to go through lots of already-processed items before returning anything, and that's what is taking the time – thecoop Sep 28 '16 at 11:29

3 Answers3

1

you can try to add a /*+ first_rows */ hint to you query

SELECT /*+ first_rows (10) */... FROM items i ...

or try to select first the unprocessed items and than do <additional queries on items>

with i_to_process AS
(
  SELECT item  FROM items
  minus 
  SELECT item FROM itemsProcessed WHERE processor IN (1, 2)
)
select * from i_to_process
where 
<additional queries on items>
schurik
  • 7,798
  • 2
  • 23
  • 29
0

IMO this is a design problem. You're trying to exclude items which have already been processed when you should be trying to include items which have not been processed. The list of items which have been processed is going to grow continually; the list of items to be processed will remain small. Rather than having a table of items which have been processed (itemsProcessed) I suggest you create a table of items to be processed and then inner-join it to the query, deleting items from the ITEMS_TO_BE_PROCESSED table as they're processed.

Best of luck.

  • That is my preferred solution too, but unfortunately the list of processors is not known when each item is created. – thecoop Sep 28 '16 at 11:35
0

Depending on how often the table is updated, you could create a materialized view of itemsNotProcessed. The processing would be done beforehand. You could also denormalize a bit and add a processed flag to the items table and add a bitmap index on the flag.