-1

How would you correctly iterate all the rows of a database query when doing multiple passes of a long operation without missing any row? In my case i have alphanumeric IDs, so id column is not reliable...

The main problem is that the query results may change between each cron task execution.

A first idea could be to store the offset / "number of rows processed". But if one of the already processed rows is deleted the others would shift and the first one would be skipped.

If I store the id of the last processed row and skip to the one after it I have a worse problem: if that exact row is deleted the next cron job will skip every remaining row.

Is there any more "standard" way of iterating a table like this ?

I'm doing this in php with a third party orm engine using sql as data store, but i think it's more a general question so anyone feel free to retag this more appropriately.

beppe9000
  • 1,056
  • 1
  • 13
  • 28
  • If you order your SELECT by the ID asc (for example) and store the last ID processed. The next time just use WHERE ID > storedID – Nigel Ren Jun 02 '20 at 18:35
  • Ideally you write this entirely as a query or stored procedure so that you can use the SQL engine's guarantees that you're not going to miss any rows. Failing that, you'd probably want to add a timestamp column that records the last time a row was "processed". – Sammitch Jun 02 '20 at 18:42

2 Answers2

0

Most DBMS have a reliable standard sorting. So the easiest solution would be not to use any sorting and use LIMIT and OFFSET definitions. E.g in the case of MySQL you could do something like this:

SELECT *
FROM mytable
LIMIT 10,10
;

This should be reliable.

If you would like to take deleted rows into consideration, I would use some sort of queue. E.g.:

CREATE TABLE item(
    id INT PRIMARY KEY
);

CREATE TABLE job (
    id INT PRIMARY KEY
);

CREATE TABLE job_queue(
    id INT PRIMARY KEY,
    job_id INT,
    item INT,
    FOREIGN KEY job_id REFERENCES job(id),
    FOREIGN KEY item REFERENCES item(id)
);

Then you could add all items to the queue by running:

INSERT INTO job_queue (job_id, item_id)
SELECT 1, id
FROM item;

Now you could fetch the items reliably by running the query:

SELECT *
FROM job_queue
WHERE job_id = 1
LIMIT 10;

DELETE FROM job_queue WHERE id IN (
    SELECT id
    FROM job_queue
    WHERE job_id = 1
)

But there are many possible valid solutions. A final answer heavily depends on your requirements.

mscho
  • 860
  • 7
  • 16
  • 1
    Not sure of using LIMIT without ORDER BY https://stackoverflow.com/questions/34812412/is-it-safe-to-use-limit-without-order-by – Nigel Ren Jun 02 '20 at 18:34
  • I would guess, it depends. Some DBMS have an internal row id, which is used in the case of a missing ORDER. As far is I know, the following DBMS are save regarding this aspect: MySQL, PostgreSQL, MSSQL – mscho Jun 02 '20 at 18:43
0

Well you can use DB transactions to resolve concurrency if any 2 crons hit simultaneously. To know from which row to start next, you can have an extra column in DB say processed which stores some integer say 1. So your next cron hit will fetch rows that don't have 1 in the processed column.

nice_dev
  • 17,053
  • 2
  • 21
  • 35