3

Is there a way to verify if the value in blob in more rows is identical in the Oracle database? The blob is the same column.

I have the size of the blob stored in another column but I need to know if the value is equal too.

The value stored is a serialized object.

Random42
  • 8,989
  • 6
  • 55
  • 86
  • possible duplicate of [SQL - How do you compare a CLOB](http://stackoverflow.com/questions/110587/sql-how-do-you-compare-a-clob) – D'Arcy Rittich Aug 29 '12 at 15:11
  • Unfortunately, supposing you want to identify _which_ rows contain the identical values, there's no way to avoid the `JOIN`, period. At minimum, add the size-of-object column in the conditions, and make sure it comes first in any indices (that reference the `BLOB`) - this should help for comparisons (by eliminating differently sized rows). – Clockwork-Muse Aug 29 '12 at 15:11
  • There is no problem if JOINs are used. – Random42 Aug 29 '12 at 15:12

1 Answers1

3

I would recommend adding another column which contains the hash of the blob value. When you store the blob you also calculate the hash value, using SHA256 for example, and store that. When you later want distinct values you just use this new column.

Magnus
  • 45,362
  • 8
  • 80
  • 118
  • Is there any special command to calculate the hash of an exisiting blob value directly by SQL command? – Random42 Aug 29 '12 at 15:29
  • 2
    Read [this question](http://stackoverflow.com/questions/1749753/making-a-sha1-hash-of-a-row-in-oracle) for information of how to do it in the database. – Magnus Aug 29 '12 at 15:37
  • 1
    I would recommend that you store that hash value and not calculate it every time for all rows since it is an expensive operation. – Magnus Aug 29 '12 at 15:40
  • I would just add that you use hash and perform additional equality condition, hoping that hash will dismiss obviously different blobs and that equality will prevent you from suffering from hash collisions. – Daniel Mošmondor Aug 29 '12 at 15:48
  • @DanielMošmondor If you are using a hash algorithm such as `SHA256` or `SHA512` which does not currently have any known vulnerabilities, I don't see the need to perform additional equality comparison. Also if you add additional equality comparison the query engine decides the order to execute them and it might end up before the hash comparison. – Magnus Aug 29 '12 at 16:12
  • @Magnus Due to the possibility of [hash collisions](http://en.wikipedia.org/wiki/Hash_collision), just because you have two equal hash values does not mean you have two equal BLOBs. – GriffeyDog Aug 29 '12 at 18:35
  • @GriffeyDog The probability of collision is about p2/2n+1. If we have a billion BLOBS (p) using the `SHA-256` (n=256) algorithm then the probability of a collision is about 4.3*10^-60. Which is negligible small. As comparison the probability of an asteroid hitting the earth in the the next second to about 10^-15. – Magnus Aug 29 '12 at 19:33