0

I have a table EMPLOYEE with the following columns in my MySQL (innoDB) database,

  • internal_employee_id (auto incrementing PK)
  • external_employee_id
  • name
  • gender
  • exported (boolean field)

In a distributed system I want to ensure that multiple processes in the cluster read the top 100 distinct rows from the table each time for which the exported column is set to false. The rows read by the process should remain locked during calculation such that if process1 reads row 1-100, process2 should not be able to see the rows from 1-100 and should then pick up the next available 100 rows.

For this, I tried using pessimistic_write locks but they don't seem to serve the purpose. They do block multiple processes from updating at the same time but multiple processes can read the same locked rows.

I tried using the following java code,

    Query query = entityManager.createNativeQuery("select * from employee " +
        "where exported = 0 limit 100 for update");
    List<Employee> employeeListLocked = query.getResultList();
  

EDIT: Found the answer finally

What I needed was to use the "Skip Locked" feature. So my updated code has become:

  Query query = entityManager.createNativeQuery("select * from employee " +
        "where exported = 0 limit 100 for update skip locked");

with the help of 'skip locked' all the rows that are in a locked state are ignored/skipped by the db engine when running a select. Hope this helps you all.

Murtaza Hasan
  • 183
  • 1
  • 15

4 Answers4

1

You could add a new column in the table for example, a column named 'processed' (boolean field) and update all the records with the false value

update EMPLOYEE set processed = 0;

When a process starts, in the same transaction, you can select for update and then update in these 100 rows the column processed to 1.

    Query query = entityManager.createNativeQuery("select * from employee " +
            "where exported = 0 and processed = 0
    order by internal_employee_id desc  limit 100 for update");
        List<Employee> employeeListLocked = query.getResultList();

make an update on these 100 rows

UPDATE EMPLOYEE eUpdate INNER JOIN (select internal_employee_id
       from EMPLOYEE where exported = 0 and processed = 0
       order by internal_employee_id desc limit 100) e
     ON eUpdate.internal_employee_id = e.internal_employee_id
       SET eUpdate.processed = 1 ;

Then, the next process will not process the same list

Rick James
  • 135,179
  • 13
  • 127
  • 222
Periklis Douvitsas
  • 2,431
  • 1
  • 13
  • 14
  • What happens when the processes are concurrent? – Murtaza Hasan Dec 30 '20 at 16:39
  • The first process will lock and update the rows so the next one will read the next rows – Periklis Douvitsas Dec 30 '20 at 16:48
  • So what you mean is something like: 1. Process queries top 100 records 2. Updates and sets the flag processed = T What happens if the during 1. another process runs concurrently and does the same? I guess I have got my answer which is to use the 'SKIP LOCKED' feature recently introduced. so my query would become: "select * from employee " + "where exported = 0 and processed = 0 order by internal_employee_id desc limit 100 for update skip locked" – Murtaza Hasan Dec 30 '20 at 17:13
  • 1
    yes exactly, you can use the SKIP LOCKED as well. In addition, if the SKIP LOCKED is not used then if multiple processes run at the same time then the other transactions trying to access those rows are forced to wait for the transaction that locked the rows to finish. – Periklis Douvitsas Dec 30 '20 at 21:30
0

There are a couple of ways to block reads:

The session that wants to update the tables first does:

LOCK TABLES employee WRITE;

This acquires an exclusive metadata lock on the table. Then other sessions are blocked, even if they only try to read that table. They must wait for a metadata lock. See https://dev.mysql.com/doc/refman/8.0/en/lock-tables.html for more information on this.

The downside of table locks is that they lock the whole table. There's no way to use this to lock individual rows or sets of rows.

Another solution is that you must code all reads to require a shared lock:

SELECT ... FROM employee WHERE ... LOCK IN SHARE MODE;

MySQL 8.0 changes the syntax, but it works the same way:

SELECT ... FROM employee WHERE ... FOR SHARE;

These are not metadata locks, they're row locks. So you can lock individual rows or sets of rows.

A request for a shared lock on some rows won't conflict with other shared locks on those rows, but if there's an exclusive lock on the rows, the SELECT FOR SHARE waits. The reverse is true too -- if there's any SELECT FOR SHARE on the rows uncommitted, the request for exclusive lock waits.

The downside of this method is that it only works if all queries that read that table have the FOR SHARE option.

All that said, I post this just to answer your question directly. I do think that the system described in the answer from Perkilis is good. I implemented a system like that recently, and it works.

Sometimes the implementation you have in mind is not the best solution, and you need to consider another way to solve the problem.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
0
-- In a transaction by itself:
UPDATE t
    SET who_has = $me   -- some indicate of the process locking the rows
    WHERE who_has IS NULL
    LIMIT 100;

-- Grab some or all rows that you have and process them.
-- You should not need to lock them further (at least not for queue management)
SELECT ... WHERE who_has = $me ...

-- Eventually, release them, either one at a time, or all at once.
-- Here's the bulk release:
UPDATE t SET who_has = NULL
    WHERE who_has = $me
-- Again, this UPDATE is in its own transaction.

Note that this general mechanism has no limitations on how long it takes to "process" the items.

Also, the use of that extra who_has column helps you if there is a crash without releasing the items. It should be augmented by a timestamp of when the items were grabbed. A cron job (or equivalent) should look around for any unprocessed items that have been locked for "too long".

Rick James
  • 135,179
  • 13
  • 127
  • 222
0

FOUND THE ANSWER:

What I needed was to use the "Skip Locked" feature. So my updated code has become:

  Query query = entityManager.createNativeQuery("select * from employee " +
        "where exported = 0 limit 100 for update skip locked");
    List<Employee> employeeListLocked = query.getResultList();

with the help of 'skip locked' all the rows that are in a locked state are ignored/skipped by the db engine when running a select. Hope this helps you all.

Murtaza Hasan
  • 183
  • 1
  • 15