0

I have two tables, new (shared table) and old (regular in-memory table), each with less than 10,000 rows. Both tables contain columns “a“ and “b“, and values in the "a" columns are unique. I want to compare the two columns to find records that only exist in table new.

The only method I know is to append all records from the table new to the table old and then use isDuplicated to remove duplicates. I'm wondering if there's a more efficient method.

lulunolemon
  • 219
  • 3

2 Answers2

1

Simply to use not in when values in the "a" columns are unique:

select * from t1 where t1.a not in t2.a
Hanwei Tang
  • 413
  • 3
  • 10
1

Apply a where condition where the function makeKey is used to combine the values of two columns into unique keys for faster filtering, such as where makeKey(a, b) not in makeKey(old.a, old.b). Refer to the following script:

tmp = table(1 3 4 5 6 as a, take(`A`B, 5) as b, rand(10.0, 5) as val)
share tmp as new
old = table(2 3 4 7 8 as a, take(`A`B, 5) as b, rand(10.0, 5) as val)
select * from new where makeKey(a, b) not in makeKey(old.a, old.b)
Polly
  • 603
  • 3
  • 13