20

I would like to find the distinct CLOB values that can assume the column called CLOB_COLUMN (of type CLOB) contained in the table called COPIA.

I have selected a PROCEDURAL WAY to solve this problem, but I would prefer to give a simple SELECT as the following: SELECT DISTINCT CLOB_COLUMN FROM TABLE avoiding the error "ORA-00932: inconsistent datatypes: expected - got CLOB"

How can I achieve this?

Thank you in advance for your kind cooperation. This is the procedural way I've thought:

-- Find the distinct CLOB values that can assume the column called CLOB_COLUMN (of type CLOB)
-- contained in the table called COPIA
-- Before the execution of the following PL/SQL script, the CLOB values (including duplicates) 
-- are contained in the source table, called S1
-- At the end of the excecution of the PL/SQL script, the distinct values of the column called CLOB_COLUMN
-- can be find in the target table called S2

BEGIN
   EXECUTE IMMEDIATE 'TRUNCATE TABLE S1 DROP STORAGE';

   EXECUTE IMMEDIATE 'DROP TABLE S1 CASCADE CONSTRAINTS PURGE';
EXCEPTION
   WHEN OTHERS
   THEN
      BEGIN
         NULL;
      END;
END;

BEGIN
   EXECUTE IMMEDIATE 'TRUNCATE TABLE S2 DROP STORAGE';

   EXECUTE IMMEDIATE 'DROP TABLE S2 CASCADE CONSTRAINTS PURGE';
EXCEPTION
   WHEN OTHERS
   THEN
      BEGIN
         NULL;
      END;
END;

CREATE GLOBAL TEMPORARY TABLE S1
ON COMMIT PRESERVE ROWS
AS
   SELECT CLOB_COLUMN FROM COPIA;

CREATE GLOBAL TEMPORARY TABLE S2
ON COMMIT PRESERVE ROWS
AS
   SELECT *
     FROM S1
    WHERE 3 = 9;

BEGIN
   DECLARE
      CONTEGGIO   NUMBER;

      CURSOR C1
      IS
         SELECT CLOB_COLUMN FROM S1;

      C1_REC      C1%ROWTYPE;
   BEGIN
      FOR C1_REC IN C1
      LOOP
         -- How many records, in S2 table, are equal to c1_rec.clob_column?
         SELECT COUNT (*)
           INTO CONTEGGIO
           FROM S2 BETA
          WHERE DBMS_LOB.
                 COMPARE (BETA.CLOB_COLUMN,
                          C1_REC.CLOB_COLUMN) = 0;

         -- If it does not exist, in S2, a record equal to c1_rec.clob_column, 
         -- insert c1_rec.clob_column in the table called S2
         IF CONTEGGIO = 0
         THEN
            BEGIN
               INSERT INTO S2
                    VALUES (C1_REC.CLOB_COLUMN);

               COMMIT;
            END;
         END IF;
      END LOOP;
   END;
END;
OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
UltraCommit
  • 2,236
  • 7
  • 43
  • 61

8 Answers8

14

If it is acceptable to truncate your field to 32767 characters this works:

select distinct dbms_lob.substr(FIELD_CLOB,32767) from Table1
peter.hrasko.sk
  • 4,043
  • 2
  • 19
  • 34
Laurent
  • 151
  • 1
  • 2
  • 2
    **If** truncation is tolerable, this solution is a lot more manageable than the accepted solution. – halloleo Apr 18 '18 at 02:07
8

You could compare the hashes of the CLOB to determine if they are different:

SELECT your_clob
  FROM your_table
 WHERE ROWID IN (SELECT MIN(ROWID) 
                   FROM your_table
                  GROUP BY dbms_crypto.HASH(your_clob, dbms_crypto.HASH_SH1))

Edit:

The HASH function doesn't guarantee that there will be no collision. By design however, it is really unlikely that you will get any collision. Still, if the collision risk (<2^80?) is not acceptable, you could improve the query by comparing (with dbms_lob.compare) the subset of rows that have the same hashes.

Community
  • 1
  • 1
Vincent Malgrat
  • 66,725
  • 9
  • 119
  • 171
  • Interesting solutions, but my Oracle user has not access to DBMS_CRYPTO package. How can I workaround? – UltraCommit Oct 13 '10 at 11:29
  • 1
    Hashes are not guaranteed to be different for different inputs. – Janek Bogucki Oct 13 '10 at 11:32
  • 1
    @Janek Bogucki: given the extremely low probability of a SHA1 hash collision, you can safely assume that two naturally occuring strings (i.e. not reversed engineered explicitely for this purpose) who have the same SHA1 hash are equal :) – Vincent Malgrat Oct 13 '10 at 11:42
  • Okay, but in Oracle PL/SQL is not available another package which implements the same functionality of DBMS_CRYPTO, that I can't see with my Oracle user? – UltraCommit Oct 13 '10 at 11:45
  • I have thought to ORA_HASH, but is useless for my scope. – UltraCommit Oct 13 '10 at 11:46
  • 1
    @The chicken in the kitchen: ask for the right to use this package ? – Vincent Malgrat Oct 13 '10 at 11:50
  • No, sorry, I am not authorized to ask the right to use DBMS_CRYPTO package :-( – UltraCommit Oct 13 '10 at 11:53
  • I have tried with SYSMAN.ENCRYPT and SYSMAN.DECRYPT with no good results: I obtain the following error: ORA-06502: PL/SQL: numeric or value error: character string buffer too small ORA-06512: at "SYSMAN.ENCRYPT", line 10 – UltraCommit Oct 13 '10 at 12:18
  • 1
    Watch out for null arguments to dbms_crypto.HASH(x, 3). +1 for hash collision probability observation. – Janek Bogucki Oct 13 '10 at 12:42
  • I am not authorized to use DBMS_CRYPTO.HASH: my Oracle user does not see this package. – UltraCommit Oct 13 '10 at 12:50
  • 3
    You say: "this user needs to use DBMS_CRYPTO, otherwise your problem will not be solved." There are no real security risks with the grant. It's common practice to only grant privileges as required. Here, you have a case where it's required, so they should grant it. – Jeffrey Kemp Oct 13 '10 at 13:35
  • Thanks for your kind explanation Jeffrey, but in my work environment the use of this native Oracle package is not allowed. My request has been declined. – UltraCommit Oct 13 '10 at 14:13
  • Perhaps if you can't get permission for the user to have access to the whole package, then maybe you can create a wrapper function under a more privileged user, and then the application's user can avoid using DBMS_CRYPTO directly. – Adam Hawkes Oct 20 '10 at 02:30
  • How can I create a wrapper function similar to DBMS_CRYPTO?? – UltraCommit Oct 29 '10 at 14:29
7

add TO_CHAR after distinct keyword to convert CLOB to CHAR

SELECT DISTINCT TO_CHAR(CLOB_FIELD) from table1;   //This will return distinct values in CLOB_FIELD
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
Tahir Usman
  • 71
  • 1
  • 4
6

Use this approach. In table profile column content is NCLOB. I added the where clause to reduce the time it takes to run which is high,

with
  r as (select rownum i, content from profile where package = 'intl'),
  s as (select distinct (select min(i) from r where dbms_lob.compare(r.content, t.content) = 0) min_i from profile t where t.package = 'intl')
select (select content from r where r.i = s.min_i) content from s
;

It is not about to win any prizes for efficiency but should work.

Janek Bogucki
  • 5,033
  • 3
  • 30
  • 40
  • 1
    With 100 records it works great, but with 5500 record it is too slow. Now I am trying to use ROWID instead of ROWNUM: ROWID is more efficient. – UltraCommit Oct 13 '10 at 11:16
  • 2
    The problem will not be rownum vs rowid. The problem will be the O(n^2) or O(n^3) (just a guess) runtime characteristics. – Janek Bogucki Oct 13 '10 at 11:31
3

select distinct DBMS_LOB.substr(column_name, 3000) from table_name;

Nitin Pawar
  • 1,634
  • 19
  • 14
2

If truncating the clob to the size of a varchar2 won't work, and you're worried about hash collisions, you can:

  • Add a row number to every row;
  • Use DBMS_lob.compare in a not exists subquery. Exclude duplicates (this means: compare = 0) with a higher rownum.

For example:

create table t (
  c1 clob
);

insert into t values ( 'xxx' );
insert into t values ( 'xxx' );
insert into t values ( 'yyy' );

commit;

with rws as (
   select row_number () over ( order by rowid ) rn,
          t.*
   from   t
)
  select c1 from rws r1
  where  not exists (
    select * from rws r2
    where  dbms_lob.compare ( r1.c1, r2.c1 ) = 0
    and    r1.rn > r2.rn
  );

C1    
xxx   
yyy  
UltraCommit
  • 2,236
  • 7
  • 43
  • 61
Chris Saxon
  • 9,105
  • 1
  • 26
  • 42
0

To bypass the oracle error, you have to do something like this :

SELECT CLOB_COLUMN FROM TABLE COPIA C1 WHERE C1.ID IN (SELECT DISTINCT C2.ID FROM COPIA C2 WHERE ....)

Salim Hamidi
  • 20,731
  • 1
  • 26
  • 31
0

I know this is an old question but I believe I've figure out a better way to do what you are asking.

It is kind of like a cheat really...The idea behind it is that You can't do a DISTINCT of a Clob column but you can do a DISTINCT on a Listagg function of a Clob_Column...you just need to play with the partition clause of the Listagg function to make sure it will only return one value.

With that in mind...here is my solution.

SELECT DISTINCT listagg(clob_column,'| ') within GROUP (ORDER BY unique_id) over (PARTITION BY unique_id) clob_column 
 FROM copia;
user3768804
  • 139
  • 1
  • 1
  • 15