1

For two tables I would like to compare a given set of columns (two in the example below but could be more) to see if their values are identical.

The problem that I'm facing is when the columns contain null values (in the example below I'm referring to when var_1 and var_2 contain nulls). In this example one column is of date format and the other is character, but as this can vary I'm trying to avoid a solution that involves filling in the null with some value.

Given that the number of columns can vary, I would ideally like a solution that either counts rows where there is a mismatch or returns one result per row.

I first tried below...

SELECT SUM(a.var_1 = b.var_1, a.var_2 = b.var_2) AS result
FROM table_1 a
INNER JOIN table_2 b USING (key) 

...but as per other SO posts you can't aggregate booleans.

Then tried a joining approach with the idea to check the row numbers before and after the join...

SELECT COUNT(*)
FROM table_1 a
INNER JOIN table_2 b 
        ON a.key = b.key  
       AND a.var_1 = b.var_1
       AND a.var_2 = b.var_2

...but this only joins non null values.

Finally tried to have one check per row

SELECT (a.var_1 = b.var_1) = (a.var_2 = b.var_2) AS result
FROM table_1 a
INNER JOIN table_2 b USING (key)

...but when there is a null present in the row the result is null.

Any help on a better approach would be appreciated.

Community
  • 1
  • 1
Sam Gilbert
  • 1,642
  • 3
  • 21
  • 38

2 Answers2

2

You can use is not distinct from

SELECT COUNT(*)
FROM table_1 a
  JOIN table_2 b 
    ON a.key = b.key  
   AND (a.var_1, a.var_2) is not distinct from (b.var_1, b.var_2);

If you want to compare all columns in the tables, you can simplify that to:

SELECT COUNT(*)
FROM table_1 a
  JOIN table_2 b 
    ON a is not distinct from b;
1

Is this what you want?

SELECT SUM( (a.var_1 is not distinct from b.var_1 AND a.var_2 is not distinct from b.var_2)::int) AS result
FROM table_1 a INNER JOIN
     table_2 b
     USING (key) ;

The ANSI-standard Postgres operator is not distinct from is a (rather verbose) NULL-safe equality operator. That is, if two values are NULL, then it returns true rather than NULL.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786