0

I have a PL/SQL package in Oracle that its important function is :

function checkDuplicate(in_id in varchar2) return boolean is
  cnt number;
  begin
      select count(*)
        into cnt
        from tbl_Log t
       where t.id = in_id

      if (cnt > 0) then
        // It means the request is duplicate on in_id
        return false;
      end if;

    insert into tbl_log (id,date) values(in_id , sysdate);
    return true;
  end;

When two requests call this function concurrently, both of them passed this function and two the same in_id inserted in tbl_log. Note: tbl_log doesn't have a PK for performance issues. Are there any solutions?

Sam
  • 6,770
  • 7
  • 50
  • 91
  • 1
    Use a [sequence](https://docs.oracle.com/cd/B28359_01/server.111/b28310/views002.htm#ADMIN11792) to generate unique values. – krokodilko Aug 19 '17 at 06:07
  • my unique value was generated in the application (java) and is a UUID so, I can't use a sequence. I want to recognize the duplication with this function. after calling this function some tasks execute in the package that I didn't show to simplify my question. – Sam Aug 19 '17 at 06:17
  • In such a case you have to serialize an access to this function using some locks. The easiest one is to [lock the table](https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_9015.htm) tbl_log, and then commit or rollback at the end to release the lock. You can also use constraint to prevent inserting duplicte rows to the table. – krokodilko Aug 19 '17 at 06:29
  • Actually, my package is on very heavy load. It means a lot of threads in application call this package so, I can't lock all table. What is your opinion about `dbms_lock` package? Is there a benefit using it? – Sam Aug 19 '17 at 06:32
  • 1
    So `tbl_log` has no unique constraint for performance reasons, and yet your requirement to check for duplicates means an extra `select count(*)` before each insert, and even then you can't be sure another session hasn't inserted the same row at the same time and given you a duplicate anyway. Are those performance reasons still justified? – William Robertson Aug 19 '17 at 23:24

3 Answers3

4

" both of them passed this function and two the same in_id inserted in tbl_log"

Oracle operates at the READ COMMITTED isolation level, so the select can only find committed records. If one thread has inserted a record for a given value but hasn't committed the transaction another thread looking for the same value will come up empty.

"Note: tbl_log doesn't have a PK for performance issues. "

The lessons of history are clear: tables without integrity constraints inevitably fall into data corruption.

"I want to recognize the duplication with this function ... Are there any solutions?"

You mean apart from adding a primary key constraint? There is no more efficient way of trapping duplication than a primary key. Maybe you should look at the performance issues. Plenty of applications mange to handle millions of inserts and still enforce integrity constraints. You should also look at the Java layer: why have you got multiple threads submitting the same ID?

APC
  • 144,005
  • 19
  • 170
  • 281
0

Note: tbl_log doesn't have a PK for performance issues.

There is no PK nor unique index on this column in order to "avoid performance issues", but there are hundreds or thousands queries like SELECT ... WHERE t.id = .. running against this table. These queries must use a full table scan due to lack of index on this column !!!!
This can cause much bigger performance issues in my opinion.


Since the values of this columns are UUIDs, then there is a very little chance of conflicted values. In this case I would prefer not to use any locks.
Just use an unique constraint (index) on this column to prevent from inserting two duplicate values.

ALTER TABLE tbl_log ADD CONSTRAINT tbl_log_id_must_be_unique UNIQUE( id );

and then use this implementation of your function:

create or replace function checkDuplicate(in_id in varchar2) return boolean is
  begin
    insert into tbl_log (id,"DATE") values(in_id , sysdate);
    return true;
  exception when dup_val_on_index then
    return false;
  end;
  /

In the vast majority of cases the function simply inserts a new record to the table without any delay because values are UUIDs.
In seldom cases of duplicated values, when the value is already commited in the table, the insert will immediatelly fail, without any delay.
In very very rare cases (almost impossible) when two threads are trying to simultanously insert the same UUID, the second thread will be held on INSERT command and will wait some time until the first thread will commit or rollback.

krokodilko
  • 35,300
  • 7
  • 55
  • 79
0

As per your condition, since you are reluctant to use Primary key data integrity enforcement( which will lead to data corruption anyhow ), i would suggest that you can use MERGE statment and keep an audit log for the latest thread updating the table. This way you will be able to eliminate the entry of duplicate record as well as keep a track of when and from which thread (latest info) the id got updated. Hope the below snippet helps.

---Create dummy table for data with duplicates

DROP TABLE dummy_hist;

CREATE TABLE dummy_hist AS
SELECT LEVEL COL1,
  'AVRAJIT'
  ||LEVEL COL2,
  SYSTIMESTAMP ACTUAL_INSERTION_DT,
  SYSTIMESTAMP UPD_DT,
  1 thread_val
FROM DUAL
  CONNECT BY LEVEL < 100;


--Update upd_dt
UPDATE dummy_hist SET upd_dt = NULL,thread_val = NULL;


SELECT * FROM dummy_hist;

--Create function

CREATE OR REPLACE
  FUNCTION checkDuplicate(
      in_id        IN VARCHAR2,
      p_thread_val IN NUMBER)
    RETURN BOOLEAN
  IS
    cnt NUMBER;
  BEGIN
    MERGE INTO dummy_hist A USING
    (SELECT in_id VAL FROM dual
    )B ON (A.COL1 = B.VAL)
  WHEN MATCHED THEN
    UPDATE
    SET a.upd_dt   = systimestamp,
      a.thread_val = p_thread_val
    WHERE a.col1   = b.val WHEN NOT MATCHED THEN
    INSERT
      (
        a.col1,
        a.col2,
        a.actual_insertion_dt,
        a.UPD_DT,
        a.thread_val
      )
      VALUES
      (
        b.val,
        'AVRAJIT',
        SYSTIMESTAMP,
        NULL,
        p_thread_val
      );
    COMMIT;
    RETURN true;
  END;
  /

--Execute the fucntion
DECLARE
  rc BOOLEAN;
BEGIN
  FOR I IN
  (SELECT LEVEL LVL FROM DUAL CONNECT BY LEVEL BETWEEN 8 AND 50
  )
  LOOP
    rc:=checkduplicate(I.LVL,3);
  END LOOP;
END;
/
Avrajit Roy
  • 3,233
  • 1
  • 13
  • 25
  • Firstly this implements a business rules which the OP has not stated (what happens in the update branch). Secondly it doesn't solve the contention between concurrent transactions. – APC Aug 21 '17 at 08:36
  • Main intention is to avoid the duplication of records. So in the approach the OP can get the audit log details too plus avoid the duplication. But anyhow the design approcah solely depends on OP. – Avrajit Roy Aug 21 '17 at 08:43
  • Your function includes a COMMIT, unlike the OP's. So it appears to solve the duplication issue but it doesn't. If two sessions fire with the same new ID at exactly the same time they would both attempt to INSERT; both sessions would succeed because there is no unique key violation. This would be more likely in the OP's situation with the transaction being managed at a higher level, with a wider window of concurrent processing. Really this is an architectural problem, so it's pretty difficult to fix it with a piece of SQL. – APC Aug 21 '17 at 09:07
  • 1
    Exactly. This should have been identified during architectural phase. Anyho PRIMARY key will somewhat reduce the mess already been coded :) – Avrajit Roy Aug 21 '17 at 09:27