0

I've been trying to find a way of comparing huge amount of data in two different tables but I am not sure if this is the correct approach.

So, this is why I am asking it here to understand a problem more and getting some clarity in order to solve it.

As title says I have two tables with less than 2 million rows of data and I need to a data comparison on them. So basically I only need to check if data in one table matches the data from other tables. Each table comes from separate database and I've managed to create views in order to have the same column names.

Here is my approach which gives me differences from two tables.

SELECT db1.ADDRESS_ID, db1.address
FROM UAT_CUSTOMERS_DB1 db1
EXCEPT
SELECT db2.ADDRESS_ID, db2.address
FROM UAT_CUSTOMERS_DB2 db2;

I have two questions, so basically this works:

  1. It seems pretty straightforward but can someone explain to me in a bit more depth how this query works with such speed ? Yes I know - read docs but I would really appreciate alternative answer.

  2. How can I include all the columns from tables without specifying manually each column name ?

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
whatismyname123
  • 149
  • 1
  • 13
  • 1) Hashtables, probably and a boatload of memory 2) `SELECT * FROM x EXCEPT SELECT * FROM y` is fine so long as and y have the same column count – Caius Jard Mar 21 '21 at 22:07
  • Same column count *and* order. It's much more robust to be explicit about column names. – Shawn Mar 22 '21 at 01:00

0 Answers0