If the tables are almost identical it would be faster to compare hashes of chunks of data, and then only compare all the data for the chunks with a difference.
I bet most of the run time was spent transferring and transforming data. Reading 100,000 rows from an Employee table probably only takes a few seconds in the database. Using the function DBMS_SQLHASH.GETHASH
, Oracle can quickly generate hashes for a large set of data. (You will probably need to have a DBA run grant execute on sys.dbms_sqlhash to your_user;
)
For example, imagine these two tables (in reality they are much larger, and on separate databases):
create table EmployeeTable1 as
select 1 a, 2 b, 3 c, 'abcdefg' EmplName from dual union all
select 1 a, 2 b, 3 c, 'bcdefg' EmplName from dual union all
select 1 a, 2 b, 3 c, 'cdefg' EmplName from dual;
create table EmployeeTable2 as
select 1 a, 2 b, 3 c, 'abcdefg' EmplName from dual union all
select 1 a, 2 b, 3 c, 'bcdefg' EmplName from dual union all
select 9 a, 9 b, 9 c, 'cdefg' EmplName from dual;
Generate a hash for each first-letter of the employee names.
--Table 1 hashes:
select 'a', dbms_sqlhash.gethash('select a,b,c,EmplName from EmployeeTable1 where EmplName like ''a%'' order by 1,2,3', 3) from dual union all
select 'b', dbms_sqlhash.gethash('select a,b,c,EmplName from EmployeeTable1 where EmplName like ''b%'' order by 1,2,3', 3) from dual union all
select 'c', dbms_sqlhash.gethash('select a,b,c,EmplName from EmployeeTable1 where EmplName like ''c%'' order by 1,2,3', 3) from dual;
a 923920839BFE25A44303718523CBFE1CEBB11053
b 355CB0FFAEBB60ECE2E81F3C9502F2F58A23F8BC
c F2D94D7CC0C82329E576CD867CDC52D933C37C2C <-- DIFFERENT
--Table 2 hashes:
select 'a', dbms_sqlhash.gethash('select a,b,c,EmplName from EmployeeTable2 where EmplName like ''a%'' order by 1,2,3', 3) from dual union all
select 'b', dbms_sqlhash.gethash('select a,b,c,EmplName from EmployeeTable2 where EmplName like ''b%'' order by 1,2,3', 3) from dual union all
select 'c', dbms_sqlhash.gethash('select a,b,c,EmplName from EmployeeTable2 where EmplName like ''c%'' order by 1,2,3', 3) from dual;
a 923920839BFE25A44303718523CBFE1CEBB11053
b 355CB0FFAEBB60ECE2E81F3C9502F2F58A23F8BC
c 6B7B1D374568B353E9A37EB35B4508B6AE665F8A <-- DIFFERENT
The Python program only has to compare the hashes, and can quickly discover that "a" and "b" are identical, and the difference is in employees that start with "c". Then the program only has to compare all the details for a smaller set of results.
Unfortunately, this solution requires more coding, as you have to build loops in Python, and construct multiple SQL statements. And the solution will be slower if your tables are wildly different, and you'll need to play around with the data to find the right chunk size.