You could use a hash of the BLOB values and group by the hash along with all the other (the non-BLOB) columns, select one pk
(or rowid
, see discussion below) from each group, for example min(pk)
or min(rowid)
, and then select the corresponding rows from the table.
For hashing you could use ora_hash
, but that is only for school work. If this is a serious project, you probably need to use dbms_crypto.hash
.
Whether this is a correct solution depends on the possibility of collisions when hashing the BLOB values. In Oracle 11.1 - 11.2 you can use SHA-1 hashes (160 bits); perhaps this is enough to distinguish between your BLOB values. In higher Oracle versions, longer hashes (up to 512 bits in my version, 12.2) are available. Obviously, the longer the hashes, the slower the query - but also the higher the likelihood that you won't incorrectly identify different BLOB values as "duplicates" due to collisions.
Other responders asked about or mentioned a primary key (pk) column or columns in your table. If you have one, you can use it instead of the rowid
in my query below - but rowid
should work OK for this. (Still, pk is preferred if your table has one.)
dbms_crypto.hash
takes an integer argument (1, 2, 3, etc.) for the hashing algorithm to be used. These are defined as named constants in the package. Alas, in SQL you can't reference package constants; you need to find the values beforehand. (Or, in Oracle 12.1 or higher, you can do it on the fly, by including a function in a with
clause - but let's keep it simple.)
So, to cover Oracle 11.1 and higher, I'll assume we want to use the SHA-1 algorithm. To find its integer value from the package, I can do this:
begin
dbms_output.put_line(dbms_crypto.hash_sh1);
end;
/
3
PL/SQL procedure successfully completed.
If your Oracle version is higher, you can check for the value of hash_sh256
, for example; on my system, it's 4. Remember this number, since we will use it below.
The query is:
select {whatever columns you need, including the BLOB}
from {your table}
where rowid in (
select min(rowid)
from {your table}
group by {the non-BLOB columns},
dbms_crypto.hash({BLOB column}, 3)
)
;
Notice the number 3 used in the hash
function - that's the value of dbms_crypto.hash_sh1
, which we found earlier.