Assume there are two tables TST_SAMPLE (10000 rows)
and TST_SAMPLE_STATUS (empty)
.
I want to iterate over each record in TST_SAMPLE
and add exactly one record to TST_SAMPLE_STATUS
accordingly.
In a single thread that would be simply this:
begin
for r in (select * from TST_SAMPLE)
loop
insert into TST_SAMPLE_STATUS(rec_id, rec_status)
values (r.rec_id, 'TOUCHED');
end loop;
commit;
end;
/
In a multithreaded solution there's a situation, which is not clear to me.
So could you explain what causes processing one row of TST_SAMPLE
several times.
Please, see details below.
create table TST_SAMPLE(
rec_id number(10) primary key
);
create table TST_SAMPLE_STATUS(
rec_id number(10),
rec_status varchar2(10),
session_id varchar2(100)
);
begin
insert into TST_SAMPLE(rec_id)
select LEVEL from dual connect by LEVEL <= 10000;
commit;
end;
/
CREATE OR REPLACE PROCEDURE tst_touch_recs(pi_limit int) is
v_last_iter_count int;
begin
loop
v_last_iter_count := 0;
--------------------------
for r in (select *
from TST_SAMPLE A
where rownum < pi_limit
and NOT EXISTS (select null
from TST_SAMPLE_STATUS B
where B.rec_id = A.rec_id)
FOR UPDATE SKIP LOCKED)
loop
insert into TST_SAMPLE_STATUS(rec_id, rec_status, session_id)
values (r.rec_id, 'TOUCHED', SYS_CONTEXT('USERENV', 'SID'));
v_last_iter_count := v_last_iter_count + 1;
end loop;
commit;
--------------------------
exit when v_last_iter_count = 0;
end loop;
end;
/
In the FOR-LOOP
I try to iterate over rows that:
- has no status (NOT EXISTS clause)
- is not currently locked in another thread (FOR UPDATE SKIP LOCKED)
There's no requirement for the exact amount of rows in an iteration.
Here pi_limit
is just a maximal size of one batch. The only thing needed is to process each row of TST_SAMPLE
in exactly one session.
So let's run this procedure in 3 threads.
declare
v_job_id number;
begin
dbms_job.submit(v_job_id, 'begin tst_touch_recs(100); end;', sysdate);
dbms_job.submit(v_job_id, 'begin tst_touch_recs(100); end;', sysdate);
dbms_job.submit(v_job_id, 'begin tst_touch_recs(100); end;', sysdate);
commit;
end;
Unexpectedly, we see that some rows were processed in several sessions
select count(unique rec_id) AS unique_count,
count(rec_id) AS total_count
from TST_SAMPLE_STATUS;
| unique_count | total_count |
------------------------------
| 10000 | 17397 |
------------------------------
-- run to see duplicates
select *
from TST_SAMPLE_STATUS
where REC_ID in (
select REC_ID
from TST_SAMPLE_STATUS
group by REC_ID
having count(*) > 1
)
order by REC_ID;
Please, help to recognize mistakes in implementation of procedure tst_touch_recs
.