How to use Scalar Subquery on a single table for lob column?

Question

I have a below query in Oracle having duplicate rows, where file_data is a BLOB column.

SELECT attachsysfilename, file_seq, version, file_size, lastupddttm, lastupdoprid, file_data 
from PS_LP_EX_FILEATTCH

I want to apply distinct clause on top of it to get unique records. But unable to do so because of BLOB column.

Can someone please help in this regards?

How can I use the Scalar subquery on file_data column to get the DISTINCT records from the table?

PLease edit the question and post the table definition, sample data, and expected results. thanks — OldProgrammer, Feb 27 '21 at 20:02
Does this answer your question? [Oracle - How can I perform a SELECT DISTINCT on all fields except a BLOB?](https://stackoverflow.com/questions/26721525/oracle-how-can-i-perform-a-select-distinct-on-all-fields-except-a-blob) — philipxy, Mar 08 '21 at 13:17

score 0 · Answer 1 · answered Feb 27 '21 at 20:21

0

assuming you have a primaru key for the PS_LP_EX_FILEATTCH table's row you could rey using subquery for an aggreagted result of the related primary key

select t.*, ps.file_data 
from (
    SELECT min(pk) my_id
         attachsysfilename
        ,file_seq
        ,version
        ,file_size
        ,lastupddttm
        ,lastupdoprid
    from PS_LP_EX_FILEATTCH
    group by attachsysfilename
        ,file_seq
        ,version
        ,file_size
        ,lastupddttm
        ,lastupdoprid
) t 
inner join PS_LP_EX_FILEATTCH ps ON t.my_id = ps.pk

answered Feb 27 '21 at 20:21

ScaisEdge

131,976
10
91
107

I don't see how this helps. Two rows may have identical values in all the selected columns **except** the BLOB column, but the BLOB values are all distinct. The query **should** return all rows (there are no duplicates, since the BLOBs are all distinct), but your query will return just one row (seeing ALL of them as duplicates of a single row). What am I missing, if anything? – Feb 27 '21 at 22:07

score 0 · Answer 2 · answered Feb 27 '21 at 23:06

You could use a hash of the BLOB values and group by the hash along with all the other (the non-BLOB) columns, select one pk (or rowid, see discussion below) from each group, for example min(pk) or min(rowid), and then select the corresponding rows from the table.

For hashing you could use ora_hash, but that is only for school work. If this is a serious project, you probably need to use dbms_crypto.hash.

Whether this is a correct solution depends on the possibility of collisions when hashing the BLOB values. In Oracle 11.1 - 11.2 you can use SHA-1 hashes (160 bits); perhaps this is enough to distinguish between your BLOB values. In higher Oracle versions, longer hashes (up to 512 bits in my version, 12.2) are available. Obviously, the longer the hashes, the slower the query - but also the higher the likelihood that you won't incorrectly identify different BLOB values as "duplicates" due to collisions.

Other responders asked about or mentioned a primary key (pk) column or columns in your table. If you have one, you can use it instead of the rowid in my query below - but rowid should work OK for this. (Still, pk is preferred if your table has one.)

dbms_crypto.hash takes an integer argument (1, 2, 3, etc.) for the hashing algorithm to be used. These are defined as named constants in the package. Alas, in SQL you can't reference package constants; you need to find the values beforehand. (Or, in Oracle 12.1 or higher, you can do it on the fly, by including a function in a with clause - but let's keep it simple.)

So, to cover Oracle 11.1 and higher, I'll assume we want to use the SHA-1 algorithm. To find its integer value from the package, I can do this:

begin
  dbms_output.put_line(dbms_crypto.hash_sh1);
end;
/

3


PL/SQL procedure successfully completed.

If your Oracle version is higher, you can check for the value of hash_sh256, for example; on my system, it's 4. Remember this number, since we will use it below.

The query is:

select {whatever columns you need, including the BLOB}
from   {your table}
where  rowid in (
                  select min(rowid)
                  from   {your table}
                  group  by {the non-BLOB columns},
                            dbms_crypto.hash({BLOB column}, 3)
                )
;

Notice the number 3 used in the hash function - that's the value of dbms_crypto.hash_sh1, which we found earlier.

score 0 · Answer 3 · answered Mar 08 '21 at 12:02

I used below query to get the distinct rows including BLOB column.

select
attachsysfilename,
file_seq,
version,
lastupddttm,
lastupdoprid,
file_data,
ROW_NUMBER() OVER (PARTITION BY attachsysfilename,file_seq,version,lastupddttm,lastupdoprid ORDER BY attachsysfilename ,file_seq ,version,lastupddttm DESC,lastupdoprid) RNK
 from ps_lp_ex_fileattch a
 ) WHERE RNK=1

How to use Scalar Subquery on a single table for lob column?

3 Answers3