0

I have a situation where I have multiple (potentially hundreds) threads repeating the same task (using a java scheduled executor, if you are curious). This task entails selecting rows of changes (from a table called change) that have not yet been processed (processed changes are kept track in a m:n join table called process_change_rel that keeps track of the process id, record id and status) processing them, then updating back the status.

My question is, how is the best way to prevent two threads from the same process from selecting the same row? Will the below solution (using for update to lock rows ) work? If not, please suggest a working solution

Create table change(
—id , autogenerated pk 
—other fields
)
Create table change_process_rel(
—change id (pk of change table)
—process id (pk of process table)
—status)

Query I would use is listed below

Select * from 
change c
where c.id not in(select changeid from change_process_rel with cs) for update

Please let me know if this would work

mustaccio
  • 18,234
  • 16
  • 48
  • 57
Connor Butch
  • 648
  • 1
  • 10
  • 28

1 Answers1

0

You have to "lock" a row which you are going to process somehow. Such a "locking" should be concurrent of course with minimum conflicts / errors.
One way is as follows:

Create table change
(
  id int not null generated always as identity
, v varchar(10)
) in userspace1;

insert into change (v) values '1', '2', '3';

Create table change_process_rel
(
  id int not null
, pid int not null
, status int not null
) in userspace1;
create unique index change_process_rel1 on change_process_rel(id);

Now you should be able to run the same statement from multiple concurrent sessions:

SELECT ID
FROM NEW TABLE
(
insert into change_process_rel (id, pid, status) 
select c.id, mon_get_application_handle(), 1
from change c 
where not exists (select 1 from change_process_rel r where r.id = c.id) 
fetch first 1 row only
with ur
);

Every such a statement inserts 1 or 0 rows into the change_process_rel table, which is used here as a "lock" table. The corresponding ID from change is returned, and you may proceed with processing of the corresponding event in the same transaction.
If the transaction completes successfully, then the row inserted into the change_process_rel table is saved, so, the corresponding id from change may be considered as processed. If the transaction fails, the corresponding "lock" row from change_process_rel disappears, and this row may be processed later by this or another application.
The problem of this method is, that when both tables become large enough, such a sub-select may not work as quick as previously.

Another method is to use Evaluate uncommitted data through lock deferral.

It requires to place the status column into the change table. Unfortunately, Db2 for LUW doesn't have SKIP LOCKED functionality, which might help with such a sort of algorithms.
If, let's say, status=0 is "not processed", and status<>0 is some processing / processed status, then after setting these DB2_EVALUNCOMMITTED and DB2_SKIP* registry variables and restart the instance, you may "catch" the next ID for processing with the following statement.

SELECT ID
FROM NEW TABLE
(
  update
  (
  select id, status
  from change
  where status=0
  fetch first 1 row only
  )
  set status=1
);

Once you get it, you may do further processing of this ID in the same transaction as previously.
It's good to create an index for performance: create index change1 on change(status); and may be set this table as volatile or collect distribution statistics on this column in addition to regular statistics on table and its indexes periodically.
Note that such a registry variables setting has global effect, and you should keep it in mind...

Community
  • 1
  • 1
Mark Barinstein
  • 11,456
  • 2
  • 8
  • 16