1

I have some tables and I want to get the hash value of every row (The column can be any date type). Each table may have different schema. The following is for testing.

select *, 
       get_sha1_hash(a, b, c, x) Hash -- Get the Sha1 hash of columns a, b, c, x
from (
    select 1 a, 2 b, 3.1 c, 'aa' x from dual union all
    select 4 a, 5 b, 6.3 c, 'bb' from dual union all
    select 1 a, 2 b, 9.9 c, 'cc' from dual ) t

How to implement get_sha1_hash? Is the following good enough?

select ora_hash(a || b || c || x) from t
ca9163d9
  • 27,283
  • 64
  • 210
  • 413
  • 1
    Concatenating will work up to 32768 characters (max varchar2). Also, I'm not sure what you're asking, but to be clear `ORA_HASH` is not a `SHA1` hash. If you want SHA1, you should use DBMS_CRYPTO; see: https://stackoverflow.com/questions/1749753/making-a-sha1-hash-of-a-row-in-oracle – kfinity Oct 04 '17 at 19:57
  • I wanted to use SHA1. However, I can use ORA_hash if it's good enough. I will need to monitor the daily changes by comparing the hash code. – ca9163d9 Oct 04 '17 at 20:00
  • ORA_HASH is a 32-bit hash function, so once you get to ~9,000 unique rows, you'll have about a 1% chance of collision. SHA1 is 160-bit, so even with billions of rows, you'll have a very small chance of collision. So yeah, it depends on your circumstances. – kfinity Oct 04 '17 at 20:24

2 Answers2

2

Use a nested STANDARD_HASH function in 12c. It supports SHA1, which as @kfinity explained is less likely to produce collisions and won't overflow the character limit. (Unless you have a ridiculous number of columns, in which case you can break the function into multiple parts and hash it again.)

If you're still using 11g you can relatively easily replicate the function like in my answer here.

select t.*,
    standard_hash
    (
        standard_hash(a, 'SHA1')||'a'||
        standard_hash(b, 'SHA1')||'b'||
        standard_hash(c, 'SHA1')||'c'||
        standard_hash(x, 'SHA1')||'x'
        , 'SHA1'
    ) hash
from
(
    select 1 a, 2 b, 3.1 c, 'aa' x from dual union all
    select 4 a, 5 b, 6.3 c, 'bb' x from dual union all
    select 1 a, 2 b, 9.9 c, 'cc' x from dual union all
    --Test with NULL and with flipping column values.
    select 1 a, 2 b, 9.9 c, null x from dual union all
    select 2 a, 1 b, 9.9 c, 'cc' x from dual
) t;

There are a few pitfalls when doing this kind of row hashing:

  1. Ensure that NULLs produce a value and don't cancel everything out.
  2. Add the column name to the hash. Otherwise the values (1,2) may produce the same hash as (2,1). This wouldn't happen in my example above since it's concatenating strings in a specific order. But if you're hashing things it's possible you'll eventually want to add the hashes together, and then it can cause problems.
  3. Check that the solution works for unusual data types. For example, ORA_HASH is not deterministic for CLOBS. I can't find any information on STANDARD_HASH so check this yourself if your tables has unusual types.
Jon Heller
  • 34,999
  • 6
  • 74
  • 132
  • Thanks, it works. I tried `standard_hash( a || b || c || x, 'SHA1')` and it seems working too. How does it compare with yours? – ca9163d9 Oct 05 '17 at 14:45
  • @ca9163d9 That method won't always work. Imagine if the values are `(1 || null || 11)` and `(111 || null || null)`. They are different values but concatenate to the same string. – Jon Heller Oct 05 '17 at 14:56
  • Good catch. How about `standard_hash( 'a' || a || 'b' || b || 'c' || c || 'x' || x, 'SHA1')`? Or just `standard_hash( ',' || a || ',' || b || ',' || c || ',' || x, 'SHA1')` – ca9163d9 Oct 05 '17 at 15:00
  • @ca9163d9 That would probably work. There are still some scenarios where you could get false positives, although those are some pretty weird cases. And there's still the chance of a string concatenation error. It would probably work but I'm not sure why you'd want to take the chance. – Jon Heller Oct 08 '17 at 00:38
  • I got what you mean now. It seems it will not have false positives in my case. What will cause the string concatenation error? Your answer will not have these issue. I was worrying the function `standard_hash()` may consume a lot of CPU cycles because I will have many columns and many rows. – ca9163d9 Oct 09 '17 at 21:02
  • @ca9163d9 The expression `'a' || a || 'b' || b ...` will be evaluated before it is sent to the `standard_hash` function. If the result of that evaluation is a string larger than 32,767 characters it will throw an exception. – Jon Heller Oct 11 '17 at 02:00
1

At least in Oracle 19 this seems to work reliably, too, and it is easier to use than @Jon-Heller 's excellent answer:

WITH values1 AS
(
  SELECT
    null, 1
  FROM
    dual
  UNION 
  SELECT
    1, null
  FROM
    dual
), values2 AS
(
SELECT
  null, '1'
FROM
  dual
UNION 
SELECT
  '1', null
FROM
  dual
)

SELECT
  rawtohex(standard_hash(json_object(*))) AS hashed
FROM
  values1
UNION ALL
SELECT
  rawtohex(standard_hash(json_object(*)))
FROM
  values2

yields

HASHED
34BF2884FD8EE260AEE479600DE7CA27E4AABEDA
5C91114F78949FDEEC84BCD5FBE4057570CD5149
DF54D6C3FD95C4FC3EA274DF50CE22AF28972F04
BD5DB32944879E000D14396CF028DDB44C70687F

... or to use the original example:

SELECT
  rawtohex(standard_hash(json_object(*))) AS hashed
FROM
(
    SELECT 1 a, 2 b, 3.1 c, 'aa' x FROM dual UNION ALL
    SELECT 4 a, 5 b, 6.3 c, 'bb' x FROM dual UNION ALL
    SELECT 1 a, 2 b, 9.9 c, 'cc' x FROM dual UNION ALL
    --Test with NULL and with flipping column values.
    SELECT 1 a, 2 b, 9.9 c, null x FROM dual UNION ALL
    SELECT 2 A, 1 b, 9.9 C, 'cc' x FROM dual
) t;
HASHED
4F7C93B5F106D7E93CD43047F39EA3503509C22A
B6565A1184308B13ACC1E543D3BD573B4D845958
A86E32443F2D81E454151CBFCF64B0D67C7F8C0B
3A522C3EE28B8D62CD3278D509FD6D4B7DF433B3
83299BB84F4FBB28B0F146DA579A831F6DA5C426
Lemmes
  • 104
  • 1
  • 9