4

I am working on a typical problem where I need to compare TWO tables with exactly same schema for data differences. Assume database as MS SQL or ORACLE.

To be more precise here is what I am trying to achieve:

  1. I have a table ORG with some data
  2. I am creating a copy of table ORG as BACKUP
  3. Now I want to update some SPECIFIC columns in some SPECIFIC rows of the table ORG.

What is the EASIEST and EFFICIENT way to find the difference between tables BACKUP and ORG?

I see few options like using UNIONS, PIVOT, UNPIVOT, etc.. But I am confused and need some guidance on best way to proceed.

Thanks,

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Krishna
  • 41
  • 1
  • 2
  • Are you trying to generate a report of the differences? Or to update a destination table from a source table? And which of the two databases are you actually using? The SQL is likely to be different. The `update` will absolutely be different. – Justin Cave Nov 18 '15 at 07:11
  • It would help if the table has a primary or unique key constraint. – Rene Nov 18 '15 at 07:42
  • Hi, the table doesnt have any primary key. I tried using JOIN and it seems to cover most of the scenarios of my interest.. This is what I did SELECT count(*) FROM ORG x INNER JOIN BACKUP q on p.city = q.city – Krishna Nov 19 '15 at 08:33
  • If the table does not have any primary key, then you should care about duplicate rows as well. – Frank Ockenfuss Nov 19 '15 at 12:58
  • your select with `inner join` compares only by column `city`. What about differences on other columns? How will you join, if more than one column is updated? – Frank Ockenfuss Nov 19 '15 at 12:59

3 Answers3

3

You could use the SET operators MINUS/INTERSECT depending on what you want, the difference or the match between the rows in the two tables.

To get the difference, use MINUS:

SELECT <here_goes_your_column_list> FROM org
MINUS
SELECT <here_goes_your_column_list> FROM backup

To get the match, use INTERSECT:

SELECT <here_goes_your_column_list> FROM org
INTERSECT
SELECT <here_goes_your_column_list> FROM backup

See more details in documentation.

Lalit Kumar B
  • 47,486
  • 13
  • 97
  • 124
  • Thank You. I tried join and it seems to work. I will explore the options you have provided incase of any further refinement. – Krishna Nov 19 '15 at 08:35
  • @Krishna No problem. If the answer helped, please mark it as answered, would help others too! – Lalit Kumar B Nov 19 '15 at 08:37
  • I had tried this solution on MSSQL, and got that Minus on Oracle is same as except on MSSQL. I need columns with different data, with the same thing as mentioned above. Can this be possible on MSSQL? – Varun Jain Aug 05 '16 at 09:32
1

try this

-- compare tables data and data2
with data as
 (select 1 as id, 'A' as val
    from dual
  union
  select 2, 'B' from dual),
data2 as
 (select 1 as id, 'A' as val
    from dual
  union
  select 2, 'BC' from dual)

 -- data not in data2 and data2 not in data:
 (select *
    from data2
  minus
  select * from data) union all
 (select *
    from data
  minus
  select * from data2);
Frank Ockenfuss
  • 2,023
  • 11
  • 26
  • Thank You. I tried join and it seems to work. I will explore the options you have provided incase of any further refinement. – Krishna Nov 19 '15 at 08:35
1

If the question is to check identity of the two tables, i.e. the answer would be yes - the tables contain the same data or no - there is a difference, there is an other approach, not too common in the relational databases, but state of the art for files.

This is the same approach as if you check the "correctness" (i.e. identity with the original) of a downloaded file using the hash code. If you see the same hash code, the answer is yes you have the right file.

How to transform this approach to the relational databases?

Oracle provides the function standard_hash that calculates MD5 hash code for a string

 select standard_hash('foo bar', 'MD5') hash_code from dual; 

 HASH_CODE                      
 --------------------------------
 327B6F07435811239BC47E1544353273 

So you can compare the column values. The function standard_hash works with CLOBs as well, so (in theory) you may concatenate the columns and rows and calculate the hash code for the whole table. But this is not the right approach, remember the hash code for files is calculate incrementally by combining the hash codes of the lines.

Here is the demonstration how to calculate the MD5 hash code using the Java class java.security.MessageDigest I'm using a Groovy script, as this is unfortunately not possible in PL/SQL.

 MessageDigest digest = MessageDigest.getInstance("MD5") 
 byte[] md5hash 

 groovyCon.eachRow ('select txt from MY_TABLE order by id') 
    {
       digest.update(it.txt.getBytes(StandardCharsets.UTF_8))
    }   

 md5hash = digest.digest();
 println md5hash.encodeHex().toString()

The script initiates the the hash code, than iterates through the lines and updates the hash code and finally presets it as a string. This is a typical approach while working with files, where the order of lines matters. In relational tables the order is not defined. You consider identical the table with (A,B,C) and (C,B,A).

See here the discussion how to combine the hash codes in an order independent matter using XOR.

Here is an example of combining of the hash code of two strings

 select UTL_RAW.BIT_XOR(standard_hash('foo', 'MD5'), standard_hash('bar', 'MD5')) hash_code from dual;

 HASH_CODE                      
 --------------------------------
 9B0805C206B7EBB8B6B9931D83E9F52A 

This approach has a big advantage that it can be implemented using PL/SQL. See here an example of an implementation of PL/SQL aggregate function MD5_XOR calculation a MD5 hash code for the whole table.

 select   MD5_XOR(txt) hash_code, count(*) cnt 
 from MY_TABLE;

 HASH_CODE                          CNT
 --------------------------------   -------
 173F1F8F85F1A154044B7629A23E949C   102 

Of course you may concatenate the parameter to calculate the hash code of the whole table

 select MD5_XOR(to_char(id)||COL_TXT|| to_char(COL_DATE,'dd.mm.yyyy hh24:mi:ss')) md5  from  MY_TABLE;

or you may use GROUP BY to parts of the table and see which groups are identical and which are different.

Much better, if this aggregate function would be natively implemented by Oracle, the performance would shine and will be much better that using the SQL compare based on set operation (that requires sorting the table). The XOR combine don't require sort and has O(N) complexity, but the user implementation suffers from context switch.

See here the idea to make it an Oracle native implementation.

Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53