If the question is to check identity of the two tables, i.e. the answer would be yes
- the tables contain
the same data or no
- there is a difference, there is an other approach, not too common in the relational databases,
but state of the art for files.
This is the same approach as if you check the "correctness" (i.e. identity with the original) of a downloaded file
using the hash code. If you see the same hash code, the answer is yes
you have the right file.
How to transform this approach to the relational databases?
Oracle provides the function standard_hash
that calculates MD5
hash code for a string
select standard_hash('foo bar', 'MD5') hash_code from dual;
HASH_CODE
--------------------------------
327B6F07435811239BC47E1544353273
So you can compare the column values.
The function standard_hash
works with CLOB
s as well, so (in theory) you may concatenate
the columns and rows and calculate the hash code for the whole table. But this is not the right approach, remember
the hash code for files is calculate incrementally by combining the hash codes of the lines.
Here is the demonstration how to calculate the MD5
hash code using the Java class java.security.MessageDigest
I'm using a Groovy script, as this is unfortunately not possible in PL/SQL.
MessageDigest digest = MessageDigest.getInstance("MD5")
byte[] md5hash
groovyCon.eachRow ('select txt from MY_TABLE order by id')
{
digest.update(it.txt.getBytes(StandardCharsets.UTF_8))
}
md5hash = digest.digest();
println md5hash.encodeHex().toString()
The script initiates the the hash code, than iterates through the lines and update
s the hash code and finally
presets it as a string. This is a typical approach while working with files, where the order of lines matters. In relational tables
the order is not defined. You consider identical the table with (A,B,C) and (C,B,A).
See here the discussion how to combine the hash codes in an order independent matter using XOR
.
Here is an example of combining of the hash code of two strings
select UTL_RAW.BIT_XOR(standard_hash('foo', 'MD5'), standard_hash('bar', 'MD5')) hash_code from dual;
HASH_CODE
--------------------------------
9B0805C206B7EBB8B6B9931D83E9F52A
This approach has a big advantage that it can be implemented using PL/SQL. See here an example
of an implementation of PL/SQL aggregate function MD5_XOR
calculation a MD5 hash code for the whole table.
select MD5_XOR(txt) hash_code, count(*) cnt
from MY_TABLE;
HASH_CODE CNT
-------------------------------- -------
173F1F8F85F1A154044B7629A23E949C 102
Of course you may concatenate the parameter to calculate the hash code of the whole table
select MD5_XOR(to_char(id)||COL_TXT|| to_char(COL_DATE,'dd.mm.yyyy hh24:mi:ss')) md5 from MY_TABLE;
or you may use GROUP BY
to parts of the table and see which groups are identical and which are different.
Much better, if this aggregate function would be natively implemented by Oracle, the performance would shine and will be much better
that using the SQL compare based on set operation (that requires sorting the table). The XOR
combine don't require
sort and has O(N)
complexity, but the user implementation suffers from context switch.
See here the idea to make it an Oracle native implementation.