SQL Query takes a long time when filtering recent rows

Question

I have this SQL query, but I've found that it can take up to 11 seconds to run. I'm really confused because when I change the date selection to a 2018 date, it returns instantly.

Here's the query:

select
    cv.table3ID, dm.Column1 ,dm.Column2, mm.Column1, 
    convert(varchar, cv.Date, 107) as Date, 
    mm.table2ID, dm.table1ID, mm.Column2,
    count(ctt.table4ID) as Total
from
    table1 dm
inner join 
    table2 mm on mm.table2ID = dm.table1ID
inner join 
    table3 cv on cv.table3ID = mm.table2ID
left join 
    table4 ct on ct.table4CVID = cv.table3ID
inner join 
    table4 ctt on ctt.table4MMID = mm.table2ID
where
    ctt.table4Date >= '2019-01-19'
    and ct.table4CVID is null  
    and dm.Column1 like '%Albert%'
    and cv.Column1 = 39505 
    and cv.Status = 'A'  
group by
    cv.table3ID, dm.Column1 ,dm.Column2, mm.Column1, 
    cv.Date, mm.table2ID, dm.table1ID, mm.Column2

I've found that when I execute that query with ctt.table4Date >= '2018-01-19', the response is immediate. But with '2019-01-19', it takes 11 seconds.

Initially, when I found that the query took 11 seconds, I thought it had to be an indexing issue, but I'm not sure any more if its got to do with the index since it executes well for an older date.

I've looked at the execution plan for the query with the different dates and they look completely different.

Any thoughts on why this might be happening? Does it have anything to do with updating the statistics?

[Update]

This image below is the comparison of the execution plan between 2018 and 2019 for table4 ctt. According to the execution plan, this takes up 43% of the operator cost in 2018 and 45% in 2019. Execution Plan comparison of table4 ctt 2019 and 2018. Top is 2019, bottom is 208

The image here is the comparison of the execution plan again for table4 as ct. Same here, top is 2019 and bottom is 2018. Execution plan of table4 ct comparison 2019 and 2018. Top is 2019, bottom is 208

[Update 2]

Here are the SQL Execution Plans:

When using '2018-01-19' as the date: https://www.brentozar.com/pastetheplan/?id=SyUh8xXQV

When using '2019-01-19' as the date: https://www.brentozar.com/pastetheplan/?id=rkELW1Q7V

@MuhammadWaheed the entire plan? I can't add the entire plan. Its too long. — Sathya, Jan 21 '19 at 05:34
First thing I see is actual and estimated rows are very different in some of the pictures. This is often due to stale statistics. You don't mention how many rows in this table, how often it is updated, and in particular, are new rows usually added for recent dates? — Nick.Mc, Jan 21 '19 at 05:57
@MuhammadWaheed here's the plan: https://www.brentozar.com/pastetheplan/?id=rkELW1Q7V — Sathya, Jan 21 '19 at 06:07
@Nick.McDermaid, the table has many many rows. More than 200000, yes rows are added for recent dates. As for how often it is updated, I"m not sure about the table statistics update schedule, nothing is specified or scheduled. But rows are inserted and updated regularly daily. — Sathya, Jan 21 '19 at 06:07
Once a day or many times a day? once a minute? Perhaps there is contention around more recent dates. — Nick.Mc, Jan 21 '19 at 06:21
@Nick.McDermaid, rows are inserted many times a day, possibly a minute, definitely many times an hour. I'm not sure what you mean by contention around more recent dates. — Sathya, Jan 21 '19 at 06:30
If you are trying to both write and read to the same database page (which might hold all the same dates, depending on what your clustered index is), at the same time, that's contention. But if you are only inserting many times an hour, it's probably not contention. — Nick.Mc, Jan 21 '19 at 06:33
@Nick.McDermaid, thanks for all your input. From what you last said, I tried the same query on an instance of the database that doesn't have any reads or writes going on and the query still took a long time. — Sathya, Jan 21 '19 at 07:18

divyanshm · Accepted Answer · 2019-01-22T12:22:35.777

1

The problem most likely is the fact that more rows are being returned from the other tables. The clustered index scan that you have linked with your [update] just shows the clustered index seek.

You do, however, need to realise that the number of times the index seek is being invoked is 144. and the actual number of rows read is in 8 digits which is causing the slow response.

I'm guessing that when this works fine for you, the actual number of executions on this table would be 1. The 144 is killing you here; given the poor seek predicates. If you know the query plan that works for you, and the indexes are already present to back it up you should forceseek plans and give explicit hints to join in particular order.

Edit

Took a look at the shared plans, changing the date to 2018 works faster for you since SQL switches to using a Hash Match in place of a loop join given the amount of data being processed.

edited Jan 22 '19 at 12:22

answered Jan 21 '19 at 07:14

divyanshm

6,600
7
43
72

Hi, thanks for your input. I get what you're saying, but the problem I'm having is that when I change condition date to '2018-01-19', the number of executions is only 2 and the actual number of rows is only 18. This is the execution plan with that date: https://www.brentozar.com/pastetheplan/?id=SyUh8xXQV – Sathya Jan 21 '19 at 07:37
Alright, the reason here is the HashMatch then. When you query for 2018 the number of rows is so large that SQL decides to do a HashMatch and not a loop join. You can either ask hint the engine to do that always OR fix up the join orders in the query by explicitly saying Table1 INNER LOOP JOIN Table2 .... and forceseeking on the correct indexes. – divyanshm Jan 21 '19 at 08:17
Do note, HashMatches take more CPU resources. If the hash match works better for you, you should keep in mind that it will eventually take more CPU to run the query. – divyanshm Jan 21 '19 at 08:17
Thanks @divyanshm I forced the loop join with table4 as ctt and that made the query return instantly! But I still don't understand why there is more data with the 2019 date over the 2018 date. – Sathya Jan 21 '19 at 08:28
There is more actual data that SQL estimates that you are reading in 2018; hence it forces a hash match because that is the fastest way to join two large tables. SQL engine thinks loop join is faster when you query with date 2019; and hence it tries the loop join (for each row in top row, retrieve rows from bottom table) and your seek predicate is not precise enough to return just the row you need. – divyanshm Jan 21 '19 at 08:36
You can try fixing it in a few ways but that was not a part of the question :) and it would need more insights on the sort of application, shape of data and relations between the tables. – divyanshm Jan 21 '19 at 08:39

SQL Query takes a long time when filtering recent rows

1 Answers1