0

I used the search function which brought me to the following solution.

Starting Point is the following: I have one table A which stores all data. From that table I select a certain amount of records and store it in table B.

In a new statement I want to select new records from table A that do not appear in table B and store them in table c. I tried to solve this with a AND ... NOT IN statement. But I still receive records in table C that are in table B.

Important: I can only work with select statements, each statement needs to start with select as well. Does anybody have an idea where the problem in the following statement could be:

Select *
From
(Select TOP 10000 * 
FROM [table_A]
WHERE Email like '%@domain_A%' 
AND Id NOT IN (SELECT Id
FROM [table_B])) 
Union
(Select TOP 7500 * 
FROM table_A] 
WHERE Email like '%@domain_B%'
AND Id NOT IN (SELECT Id
FROM [table_B]))
Union
(SELECT TOP 5000 * 
FROM [table_A] 
WHERE Email like '%@domain_C%'  
AND Id NOT IN (SELECT Id
FROM [table_B]))
crow_doe
  • 43
  • 1
  • 6
  • 3
    Your are using `select top` without 1order by`. Each time you run the query, you might get a different set of records. – Gordon Linoff Jan 23 '18 at 16:02

3 Answers3

2

Try NOT EXISTS instead of NOT IN

SELECT
*
FROM TableA A
  WHERE NOT EXISTS
  (
     SELECT 1 FROM TableB WHERE Id = A.Id
  )
Jayasurya Satheesh
  • 7,826
  • 3
  • 22
  • 39
  • I tried NOT EXIST but receive error message when I want to expand that with even more table. So at the end there will not only be table_B and table_C, but Table_D, table_E, table_F ...and so on – crow_doe Jan 23 '18 at 16:16
  • 1
    can you tell me what is the error message that you got ? – Jayasurya Satheesh Jan 23 '18 at 16:18
0

So Basically the idea here is to select everything from table A that doesnt exists in table B and Insert all that into Table C?

    INSERT INTO Table_C 
    SELECT a.colum1, a.column2,......
    FROM [table_A]
    LEFT JOIN [table_B] ON a.id = b.ID
    WHERE a.Email like '%@domain_A%' AND b.id IS NULL
  • `EXCEPT` may be simpler and faster? – MatBailie Jan 23 '18 at 16:10
  • EXCEPT is not optimized to perform like a join. Except would be something like putting everything into a temp table and then delete everything that exists in the other temp table, A properly written join would be more efficient. –  Jan 23 '18 at 16:17
  • Unfortunately I am only allowed to use SELECT at the beginning of a statement. – crow_doe Jan 23 '18 at 16:17
  • Just exclude the Insert line, That query would give everything that exists in table A and doesn't exist in table B. –  Jan 23 '18 at 16:20
  • @Markov - Do you have any references for your performance statement? As TableB is a subset of TableA, `EXCEPT` is a logical approach. See the following and compare the execution plans... http://sqlfiddle.com/#!18/fe44e/3 The difference *(on an admittedly tiny)* set is that the `EXCEPT` plan doesn't need a filter as the optimizer already knows it's a `LEFT ANTI SEMI JOIN` rather than needing to do a `LEFT JOIN` and `FILTER` to achieve the same result. – MatBailie Jan 23 '18 at 16:26
  • @MatBailie here please see this small article http://www.sqlservergeeks.com/left-join-vs-except-in-sql-server/ it depends on the data size, If you have a lot of data the except will need to use a sort that data. here is also a good link to look into https://stackoverflow.com/questions/14693118/does-except-execute-faster-than-a-join-when-the-table-columns-are-the-same –  Jan 23 '18 at 16:35
  • Its all depends on the data size but in my experience I found Left join more practical and faster then using except especially if I select statements contains a UDFs or few sub-queries. –  Jan 23 '18 at 16:38
  • The `SORT` consideration appears to be more related to whether the relevant indexes *(potentially a clustered primary key)* match, rather than data size. The linked article provides no index, cluster or otherwise, and so requires the sort. I'll read the SO link too now :) – MatBailie Jan 23 '18 at 16:40
  • The SO answer links to an article that appears to support the idea that `EXCEPT` generally out performs `LEFT JOIN IS NULL` *(assumes suitable indexing)* but is generally slightly slower than `NOT EXISTS (correlated sub-query)`. Though I only skimmed it and may be utterly mistaken. – MatBailie Jan 23 '18 at 16:45
  • There is, I would suppose, another factor in favour of both `EXISTS` and `LEFT JOIN IS NULL` : `EXCEPT` would have a minor scalar overhead in comparing all the columns in the tables, where as the others would only needs to operate on the identifier column. Unless using column store or such like, I'd expect that to be minor, though present and likely measurable. – MatBailie Jan 23 '18 at 16:53
  • In my experience, I have found that the left join is the fastest when searching for missing rows, especially when joined on the primary key, In Except it compares a column to column and it treats NULL values as matched. –  Jan 23 '18 at 17:01
  • That makes sense to me, in that `EXCEPT` compares all fields and so the `CLUSTERED` index *(usually the primary key)* is the relevant index, when joining on any other indexed field `EXCEPT` could reasonably have more work to do that `LEFT JOIN`. The SO link, and the article it links to, however, seems to show that `NOT EXISTS` is usually the way to go and both `EXCEPT` and `LEFT JOIN` have adverse situations that `NOT EXISTS` is relatively immune to. – MatBailie Jan 23 '18 at 17:07
0

Thank you guys all for your feedback, from which I learned a lot. I was able to fix the statement with your help. Above is the statement which is working now with the desired results:

Select Id
From
(Select TOP 10000 * FROM Table_A
WHERE Email like '%@domain_a%'
AND Id NOT IN (SELECT Id
FROM Table_B)
order by No desc) t1
Union
Select Id
From
(Select TOP 7500 * FROM Table_A
WHERE Email like '%@domain_b%'
AND Id NOT IN (SELECT Id
FROM Table_B)
order by No desc) t2
Union
Select Id
From
(SELECT TOP 5000 * FROM Table_A
WHERE Email like '%@domain_c%'  
AND Id NOT IN (SELECT Id
FROM Table_B)
order by No desc) t3
crow_doe
  • 43
  • 1
  • 6