Optimising SQL involving IF EXISTS

Question

I am trying to keep track of whether I need to update certain tables. I have an Events table which I am wanting to monitor for changes and have another table, called DictionaryRefresh, which keeps track of the changes made to that table. If the Events table gets edited, it's edittime will be saved and fall later than the last refresh time of the DictionaryRefresh table hence indicating the need to refresh. In addition, if new rows are added to the Events table, then there also needs to be associated new entries in the DictionaryRefresh table - hence the LEFT JOIN.

Here are the table structures

CREATE TABLE [dbo].[DictionaryRefresh]
(
    [LookupKey] [varchar](31) NOT NULL,
    [LookupValue] [varchar](255) NOT NULL,
    [RecordNumber] [int] NULL,
    [RefreshTime] [datetime] NULL,
    [EventKey] [varchar](31) NULL,
    [MappedLookupKey] [varchar](31) NULL
 ) ON [PRIMARY]

The indexes are (following the DBEngine Tuning Advisor)

CREATE NONCLUSTERED INDEX [idx_DictionaryRefresh2146B4EB] 
ON [dbo].[Ifx_DictionaryRefresh] ([LookupKey] ASC)

CREATE NONCLUSTERED INDEX [idx_DictionaryRefresh51EC6492] 
ON [dbo].[Ifx_DictionaryRefresh] ([MappedLookupKey] ASC, [RefreshTime] ASC, [RecordNumber] ASC, [EventKey] ASC)

CREATE NONCLUSTERED INDEX [idx_DictionaryRefreshFCDAD7FA] 
ON [dbo].[Ifx_DictionaryRefresh] ([LookupValue] ASC)

The Events table is as follows:

CREATE TABLE [dbo].[Events](
    [RecordNumber] [int] NOT NULL,
    ...
    [EventKey] [varchar](31) NOT NULL,
    ...
    [EditTime] [datetime] NULL,
    ...
PRIMARY KEY CLUSTERED([RecordNumber] ASC)

CREATE NONCLUSTERED INDEX [idxEvents299ADAC8] 
ON [dbo].[Events]([EditTime] ASC)

CREATE NONCLUSTERED INDEX [idxEvents5B151A5E] 
ON [dbo].[Events]([EventKey] ASC)

Now the SQL I am running is as follows - it is taking almost a minute to return. If I execute the subquery only it returns pretty much immediately.

IF EXISTS (
    SELECT 1
    FROM (
        SELECT 
            e.EventKey AS DictionaryKey
            ,ISNULL(e.EditTime, '1 Jan 1900 01:00') AS EditTime
            ,e.RecordNumber AS DictionaryRecordNumber
        FROM Events e) d
    LEFT JOIN DictionaryRefresh r ON r.RecordNumber = DictionaryRecordNumber
        AND r.EventKey = DictionaryKey
        AND r.MappedLookupKey = 'M18E2I501'
    WHERE r.RefreshTime < d.EditTime
        OR r.RecordNumber IS NULL)
BEGIN
    PRINT 'TRUE'
END

There are approx 130K rows in the DictionaryRefresh table, and approx 8K rows in the Events table

When the DictionaryRefresh table is empty or small it is very fast but slows down as the number of rows in DictionaryRefresh increases especially if none match the criteria.

Here is the execution plan.

and the salient stats (the index seek taking 94% of the cost - the number of rows accessed is effectively the square of the number of rows in the Events table) ...

I have tried replacing the

IF EXISTS

with

IF (SELECT COUNT ...) <> 0

as well as

IF (SELECT TOP 1 1 ...) = 1

but none seemed any faster.

I would obviously appreciate any suggestions you might have.

Thanks in advance.

S

Have you tried like this IF EXISTS (SELECT TOP 1 1 FROM events e LEFT JOIN dictionaryrefresh r ON r.RecordNumber = e.DictionaryRecordNumber AND r.EventKey = e.DictionaryKey AND r.MappedLookupKey = 'M18E2I501' WHERE r.RefreshTime < Isnull(e.EditTime, '1 Jan 1900 01:00') OR r.RecordNumber IS NULL OPTION (FAST 1)) BEGIN PRINT 'TRUE' END — StackUser, Jan 19 '16 at 14:12
Please provide the table structure of Events table as like Dictionaryrefresh — StackUser, Jan 19 '16 at 14:18
Thx for that. Unfortunately it took roughly the same length of time. — Simon Woods, Jan 19 '16 at 14:20
IF EXISTS is efficient as it only looks to see if a record exists or not. So if you have if exists (select top 1 1 from abc) or (select * from abc) it doesn't matter. — Dane, Jan 19 '16 at 14:43
There is no need to do `TOP `or `SELECT 1` inside an `EXISTS()` construction. The server knows to stop after the first record found. Anyway, what strikes me the most about your setup is that you seemingly don't have any *unique* indexes. Unless your data is rather exotic I'd suggest you put at least a Primary Key on the field(s) that make the record uniquely identifiable. — deroby, Jan 21 '16 at 14:22

score 2 · Answer 1 · answered Jan 21 '16 at 20:44

re-formatting your query a bit I come to this:

IF EXISTS ( SELECT 1 
              FROM (SELECT e.EventKey AS DictionaryKey
                          ,ISNULL(e.EditTime, '1 Jan 1900 01:00') AS EditTime
                          ,e.RecordNumber AS DictionaryRecordNumber
                      FROM Events e) d
              LEFT OUTER JOIN DictionaryRefresh r 
                           ON r.RecordNumber = d.DictionaryRecordNumber
                          AND r.EventKey = d.DictionaryKey
                          AND r.MappedLookupKey = 'M18E2I501'
             WHERE r.RefreshTime < d.EditTime
                OR r.RecordNumber IS NULL)
BEGIN
    PRINT 'TRUE'
END

I don't see a good reason for the sub-query on Events, so the equivalent query then becomes this:

IF EXISTS ( SELECT *
              FROM Events e
              LEFT OUTER JOIN DictionaryRefresh r 
                           ON r.RecordNumber = e.RecordNumber
                          AND r.EventKey = e.EventKey
                          AND r.MappedLookupKey = 'M18E2I501'
             WHERE r.RefreshTime < ISNULL(e.EditTime, '1 Jan 1900 01:00')
                OR r.RecordNumber IS NULL
         )
BEGIN
    PRINT 'TRUE'
END

First thing to notice is that you use r.RefreshTime in the WHERE clause. Since the < operator will only return true when the left side is DEFINED and is smaller than the right side, this means that each time r.RefreshTime is NULL the record will be skipped. Yet, the next line you obviously mention that you want all records where r.RecordNumber is NULL which can only happen when the value is actually NULL, or when the LEFT OUTER JOIN does not find a match. So there's a bit of a conflict here. Either you want to do an INNER JOIN, or you really want an OUTER JOIN but then need to move the r.RefreshTime < d.EditTime to the JOIN ON clause.

Now, looking at your table definitions, I think there is some room for improvement. Going by the explanation you gave above the Events table is 'the source' of all data. It gets appended over time and then once in a while you run a process that scans for 'new' and 'updated' records, does some magic and then updates the DictionorayRefresh (UPDATE existing records to a new RefreshTime and INSERT new ones as

[dbo].[Events]
- [EditTime] is defined as NUL-able. Maybe you consider NULL as 'record is inserted but was never updated' ? In that case I'd go for rather using '1 jan 1900' as a 'magic' value and make the field NOT NULLable, it makes life much easier later on.
[dbo].[DictionaryRefresh]
- I'm wondering why you'd want RecordNumber to be NULL-able? Shouldn't it always be filled in, otherwise what purpose would the record have ?
- You should also put a FOREIGN KEY on the field pointing to the Events table, that way the server knows all values come from there
- RefreshTime also is defined as NULL-able, again I think you'd want that to always be filled in. Otherwise how did the record get into the table?
- pretty sure you want MappedLookupKey, but that doesn't really matter all that much for this.

Anyway, back to the query. What you want to figure out is if there are records in Events that have a matching record in DictionaryRefresh for a given MappedLookupKey and newer EditTime than the corresponding RefreshTime. Or, that simply have no such a record (for this MappedLookupKey)

Personally I would write that like this:

IF EXISTS ( SELECT *
              FROM Events e
             WHERE NOT EXISTS ( SELECT *
                                  FROM DictionaryRefresh r 
                                 WHERE r.RecordNumber = e.RecordNumber
                                   AND r.EventKey = e.EventKey
                                   AND r.MappedLookupKey = 'M18E2I501'
                                   AND r.RefreshTime >= e.EditTime )

         )
BEGIN
    PRINT 'TRUE'
END

To make this work fast, you'll need the following indexes:

CREATE INDEX idx1 ON DictionaryRefresh  (MappedLookupKey, RecordNumber, EventKey, RefreshTime)

On the Events table I think the PK will do...

Fun fact: your JOIN uses both RecordNumber and EventKey (again a NULL-able field for probably no good reason). However, we already know that RecordNumber uniquely identifies a record in [Events] (it's the PK !), so if you join on just RecordNumber that actually should do, unless you can have different EventKey values in DictonaryRefresh ? That wouldn't make sense to me tbh... in fact, it seems that the field is not really needed in DictionaryRefresh as it can be found in Events in the first place. If that assumption is correct you can remove it from the table and the thus JOIN speeding things up a bit more again.

Bit of a long-read, hope I didn't goof up too much =)

wow. Thx. I'll need a bit of time to digest. Very helpful though thx again. — Simon Woods, Jan 22 '16 at 11:15

Devart · Accepted Answer · 2016-01-21T14:20:24.837

1

CREATE NONCLUSTERED INDEX ix1
    ON dbo.DictionaryRefresh (RecordNumber, EventKey, MappedLookupKey, RefreshTime)

CREATE NONCLUSTERED INDEX ix2
    ON dbo.[Events] (RecordNumber, EventKey, EditTime)

IF EXISTS (
    SELECT TOP(1) 1
    FROM dbo.[Events] e /*WITH(INDEX(ix2))*/
    LEFT JOIN dbo.DictionaryRefresh r /*WITH(INDEX(ix1))*/ ON r.RecordNumber = e.RecordNumber
        AND r.EventKey = e.EventKey
        AND r.MappedLookupKey = 'M18E2I501'
    WHERE (r.RefreshTime < e.EditTime AND e.EditTime IS NOT NULL)
        OR r.RecordNumber IS NULL
)
BEGIN
    PRINT 'TRUE'
END

edited Jan 21 '16 at 14:20

answered Jan 19 '16 at 15:18

Devart

119,203
23
166
186

Thx very much. So would you recreate the index every time you want to run the query or just check for existence and if not then create? Would you also be able to explain why you've done it that way. Thx again – Simon Woods Jan 19 '16 at 15:30
Don't need to recreate indexes... I just change the order for your index columns. Does this help? – Devart Jan 19 '16 at 15:39
Initial testing seems to indicate it is helping ... need to test it a bit more .. but thx vm again ... get back to you shortly – Simon Woods Jan 19 '16 at 15:43
Little mistake `e.EditTime IS NULL` -> `e.EditTime IS NOT NULL` – Devart Jan 19 '16 at 16:00
1

To be honest, I'd advice against INDEX hints. If the indexes are useful then the Query Optimizer will find and use them; otherwise it probably will prefer another index and/or approach. As it is right now you're essentially making it impossible for the optimizer to come up with a better plan. (it probably is optimal NOW, and it probably will stay like that in the future... or it might not... why take the chance?) – deroby Jan 21 '16 at 14:18
Thx. What would you suggest? Without Index hints, current perf isn't acceptable. I'd be interested to know any further suggestions. – Simon Woods Jan 21 '16 at 17:36
1

I'd suggest to add unique keys to your data first and then from there go further. Your query isn't all that complex, I can't imagine it would confuse the Query Optimizer all that much. I'll start a new answer type some suggestions. – deroby Jan 21 '16 at 19:37

Optimising SQL involving IF EXISTS

2 Answers2