3

I have a query in my production environment which is taking long time to execute. I did not write this query but I must find a way to make it quicker since it is causing a big performance issue at the moment. I need to replace NOT IN with Left Join but not sure how to rewrite it. It looks like following at the moment

SELECT TOP 1 IT.ITEMID
FROM   (SELECT CAST(ITEMID AS NUMERIC) + 1 ITEMID
        FROM   Items
        WHERE  ISNUMERIC(ITEMID) = 1
               AND CAST(ITEMID AS NUMERIC) >= 50000) IT
WHERE  IT.ITEMID NOT IN (SELECT CAST(ITEMID AS NUMERIC) ITEMID
                         FROM   Items
                         WHERE  ISNUMERIC(ITEMID) = 1)
ORDER  BY IT.ITEMID 

Kindly suggest how am I supposed to rewrite it using Left Join for better performance. Any help/guidance is greatly appreciated.

Aryan
  • 117
  • 1
  • 2
  • 11
  • 2
    `NOT IN` to `LEFT JOIN / IS NULL` (or to `NOT EXISTS`) will not help much if you have no indexes to be used by the query. – ypercubeᵀᴹ Jun 21 '13 at 07:14
  • 3
    A LEFT JOIN / IS NULL is usually slower then NOT IN/NOT EXISTS anyway. As @ypercube said, this is an index problem, which is not helped by the CAST and fucntions on predicates which invalidates index use anyway. – gbn Jun 21 '13 at 07:16
  • 3
    Also `ISNUMERIC(ITEMID) = 1 AND CAST(ITEMID AS NUMERIC) >= 50000` doesn't do what you hope it will do, because a) there's no guarantee on the order in which predicates are evaluated, and b) just because `ISNUMERIC` returns 1, there's no guarantee that you can cast the value to a `NUMERIC`. – Damien_The_Unbeliever Jun 21 '13 at 07:19
  • 1
    As an example, try the following two select statements: `select ISNUMERIC('0d5')` and then `select CAST('0d5' as NUMERIC)`. – Damien_The_Unbeliever Jun 21 '13 at 07:21
  • 1
    So it looks like this query is supposed to find the lowest gap in the sequence of `ITEMID` that is `>50000`? What version of SQL Server are you on? And why do you need this? – Martin Smith Jun 21 '13 at 07:58
  • @Martin, I am on SQL 2008 R2. You are absolutely correct. it is supposed to find lowest gap > than a number. If you can think better way of writing this, that would be a great help. – Aryan Jun 21 '13 at 08:15
  • 1
    @Aryan - I was hoping 2012 for `LAG/LEAD` – Martin Smith Jun 21 '13 at 08:21
  • @MartinSmith I had the same thought (about the LAG/LEAD.) Perhaps a query with the NOT IN/NOT EXISTS against a Numbers table would be more efficient than the self semijoins. – ypercubeᵀᴹ Jun 21 '13 at 12:16
  • @ypercube - Yes quite possibly. The self joined CTEs will perform multiple evaluations of the underlying inefficient query. – Martin Smith Jun 21 '13 at 12:35

4 Answers4

6

Try this one -

;WITH cte AS 
(
     SELECT DISTINCT ITEMID = 
                CASE WHEN ISNUMERIC(ITEMID) = 1 
                    THEN ITEMID 
                END
     FROM Items
)
SELECT TOP 1 ITEMID = ITEMID + 1
FROM cte t
WHERE ITEMID >= 50000
     AND NOT EXISTS(
          SELECT 1
          FROM cte t2
          WHERE t.ITEMID + 1 = t2.ITEMID
     )
ORDER BY t.ITEMID
Devart
  • 119,203
  • 23
  • 166
  • 186
  • Possible variation: moving the `WHERE ITEMID >= 50000` inside the cte. – ypercubeᵀᴹ Jun 21 '13 at 11:32
  • This query can still fail for non numeric source data. [SQL Fiddle](http://sqlfiddle.com/#!3/aa418/1) You cannot rely on the `CAST` happening after the `WHERE` – Martin Smith Jun 21 '13 at 12:36
  • @Devart thanks I am happy with the result with your solution. It has improved a lot. I know I can not expect more than this. Though there is room for non numeric data, I have not used it so far and not going to use it in future for performance reason. Edited it accordingly and perfect. – Aryan Jun 21 '13 at 21:52
  • @MartinSmith thank you very much for your thoughts. You are amazing and have been great help – Aryan Jun 21 '13 at 22:00
5

As mentioned in the comments, the NOT EXISTS version of the query is usually faster in SQLServer than the LEFT JOIN - for completeness, here's both versions:

Left join variant of existing query:

with cte as
(SELECT CAST(it.ITEMID AS NUMERIC) ITEMID 
 FROM Items
 WHERE ISNUMERIC(ITEMID) = 1)
select top 1 i.ITEMID + 1 ITEMID
FROM cte i
LEFT JOIN cte ni ON i.ITEMID + 1 = ni.ITEMID
WHERE i.ITEMID >= 50000 AND ni.ITEMID IS NULL

Not exists variant of existing query:

with cte as
(SELECT CAST(it.ITEMID AS NUMERIC) ITEMID 
 FROM Items
 WHERE ISNUMERIC(ITEMID) = 1)
select top 1 i.ITEMID + 1 ITEMID
FROM cte i
WHERE i.ITEMID >= 50000 AND NOT EXISTS
(SELECT NULL 
 FROM cte ni 
 WHERE i.ITEMID + 1 = ni.ITEMID)
4

As @gbn pointed at the comments, the CAST and functions on predicates which invalidates index use anyway, so there is no point in converting this from NOT IN to LEFT JOIN / IS NULL or to NOT EXISTS. And NOT EXISTS usually performs better than LEFT NULL in SQL-Server.

NOT IN is not advised due to the problems (wrong, unexpected results) when there are nulls (in the compared columns or produced by the expressions) and the inefficient plans because of the nullability of the columns/expessions.

And ISNUMERIC() is not doing always what you think it does (as @ Damien_The_Unbeliever noted in another comment.) There are cases where the IsNumeric result is 1 but the cast fails.

So, the sane thing to do would be - in my opinion - to add another column in the table and convert (the values that can be converted) to numeric and store them in that column. Then you could write the query without casting and an index on that column could be used.

If you cannot alter the tables in any way (by adding a new column or a materialized view), then you can try and test the various rewritings the other answers offer.

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • Thanks @ypercube for your valuable suggestion. Unfortunately I cannot change the underlying table. All I have to do is find a way to make that query a bit better. – Aryan Jun 21 '13 at 07:56
  • `NULL` won't be rejected by the `CAST` and the result of most computed expressions is regarded as `NULL`-able by SQL Server so the `NOT IN` plan will have the extra inefficiencies when compared to `NOT EXISTS` (it is the `NULL`-ability of the columns not the presence of `NULL`s that affects the plan) – Martin Smith Jun 21 '13 at 08:07
  • @MartinSmith Correct, the nullability affects plans. I was referring only to the possible wrong results when there are nulls. I'll add that, too. – ypercubeᵀᴹ Jun 21 '13 at 11:26
  • @MartinSmith With the "rejected" I meant that the nulls will not pass the `CAST(ITEMID AS NUMERIC) >= 50000` condition so the query will show same results as a Not Exists version. Am I wrong on this? – ypercubeᵀᴹ Jun 21 '13 at 11:35
  • 1
    @ypercube - The result of the computed expression is still regarded as `NULL`-able despite the `WHERE`. It can in fact return `NULL` as here `SET ANSI_WARNINGS, ARITHABORT OFF; WITH T(ITEMID) AS(SELECT 1E50) SELECT CAST(ITEMID AS NUMERIC) FROM T WHERE ISNUMERIC(ITEMID) = 1` – Martin Smith Jun 21 '13 at 12:00
4

I agree with @ypercube that the sane thing to do is to fix your schema.

If for some reason this is not an option maybe materialising the whole thing into an indexed temporary table at runtime would make the best of a bad job.

CREATE TABLE #T
(
ITEMID NUMERIC(18,0) PRIMARY KEY
                     WITH ( IGNORE_DUP_KEY = ON)
)    

INSERT INTO #T  
SELECT CASE WHEN ISNUMERIC(ITEMID) = 1 THEN ITEMID END
FROM Items
WHERE CASE WHEN ISNUMERIC(ITEMID) = 1 THEN ITEMID END >= 50000  


SELECT TOP 1 ITEMID+1
FROM #T T1
WHERE NOT EXISTS (SELECT * FROM #T T2 WHERE T2.ITEMID = T1.ITEMID +1)
ORDER BY ITEMID
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • thanks I have not tested this but would like to do it on Monday – Aryan Jun 21 '13 at 22:03
  • @Aryan - Yes without actually testing on your data it is difficult to know which will perform better, if the self joined CTEs are doing a merge join I would expect this to be better as they will have to do a full scan and sort twice. If nested loops probably depends how many rows need to be scanned before the `TOP 1` is found. – Martin Smith Jun 21 '13 at 22:06