Improving and solving long running query

Question

I have a stored procedure that is running unusually long.... Narrowing down. I got the below...

Long running query:

select @TimeStamp = getdate()
select @TimeStamp
Use FSFruit;
select *,
 case when @intDaysToGrowInQ1 = 0 then 0 else (me1) end yme1,
 case when @intDaysToGrowInQ2 = 0 then 0 else (me1 + me2) end yme2,
 case when @intDaysToGrowInQ3 = 0 then 0 else (me1 + me2 + me3) end yme3,
 case when @intDaysToGrowInQ4 = 0 then 0 else (me1 + me2 + me3 + me4) end yme4
into #temp1
from mq_fruit
where fruit_suffix **in** ('A1', 'O1', 'S1') **OR**
balance_type  **in** (select fruit_type from REF.dbo.fruit_type_tbl where fruit_type_group = 'TOT')

select @TimeStamp = getdate()
select @TimeStamp

total time it took is around 25 mins. (9095000 rows affected)

there is another almost similar query that runs faster

select @TimeStamp = getdate()
select @TimeStamp

Use FSFruit;
select *,
 case when @intDaysToGrowInQ1 = 0 then 0 else (me1) end yme1,
 case when @intDaysToGrowInQ2 = 0 then 0 else (me1 + me2) end yme2,
 case when @intDaysToGrowInQ3 = 0 then 0 else (me1 + me2 + me3) end yme3,
 case when @intDaysToGrowInQ4 = 0 then 0 else (me1 + me2 + me3 + me4) end yme4
into #temp1
from mq_fruit
where fruit_suffix **not in** ('A1', 'O1', 'S1') **and**
fruit_type  **not in** (select fruit_type from REF.dbo.fruit_type_tbl where fruit_type_group = 'TOT')

select @TimeStamp = getdate()
select @TimeStamp

total time it took is around 3 mins. (7080000 rows affected)

Trying to figure out why the difference. The first runs longer the second runs faster.

mq_fruit and fruit_type_tbl do not have any indexes.

What I am missing to make the first query faster in execution.

Thanks.

Updated: actual exec plan https://www.brentozar.com/pastetheplan/?id=rkU-3KIeo

The **ONLY** way to assist with performance is to provide 1) Table and index definitions + 2) The execution plan. The query does not provide performance information. — Dale K, Sep 07 '22 at 20:30
Can you add the actual execution plan (PasteThePlan); use *exists* against table `fruit_type_tbl`. — Stu, Sep 07 '22 at 20:36
not familiar with Actual execution plan... but seems I cannot attach it due to long xml file — simpleorchid, Sep 07 '22 at 21:06
Put it here https://www.brentozar.com/pastetheplan/ and share the link — David Browne - Microsoft, Sep 07 '22 at 21:14
@Dale what are the things i need to look for and attach the plan here... the generated actual execution plan is long. — simpleorchid, Sep 07 '22 at 21:16
Well the best option really is to take the time to learn to understand the execution plan - if you are going to be working with SQL, you are going to run into performance issues, and you'll be much better placed if you can work out how to solve them yourself. The next best thing is to try and simplify your problem i.e. experiment with removing parts of it to see which parts of your code contribute mostly to it being slow - that will reduce the execution plan. Finally, well you are looking for the bits which take a lot of time. — Dale K, Sep 07 '22 at 21:25
thanks @DaleK. where do you see that suggestions message. I am not seeing it. The two queries I posted does not show much difference other than the AND/OR/IN etc. but still it shows difference in total execution time. It uses the same tables and fields. almost same where clause part. Will it make any difference if I add index? — simpleorchid, Sep 07 '22 at 21:47
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/247859/discussion-between-simpleorchid-and-dale-k). — simpleorchid, Sep 07 '22 at 21:47
Do not `SELECT *` just select the columns you need. Separate indexes on `fruit_suffix` and `fruit_type` may help (with `INCLUDE` columns of any other necessary columns), although an `OR` style query is always hard, you may need to split it into two `UNION ALL` queries. — Charlieface, Sep 07 '22 at 22:07
The estimated 1 row coming out the first nested loops and so estimated 1 scan on [REF].[dbo].[fruit_type_tbl] vs the 35,139,362 scans that happen in reality (taking 18 minutes) isn't helping — Martin Smith, Sep 07 '22 at 22:09
Strange estimate as it has estimated 35139362 rows going into the outer join against the constant scan and they can't be reduced by an outer join but it estimates 1 coming out anyway — Martin Smith, Sep 07 '22 at 22:16
@DaleK - 89% of the **estimated** cost is attributed to a table scan of mq_fruit and 0% to the table scan of `fruit_type_tbl` - it is the second operator that is actually causing the main problem though. The estimated costs are largely best ignored — Martin Smith, Sep 07 '22 at 22:59
Thanks everyone for the help. Martin and Charlie thanks for the tips. You gave me some tips on how to improve. Yes.. I see big change when I modified the query without any additional index. — simpleorchid, Sep 09 '22 at 16:47

Martin Smith · Accepted Answer · 2022-09-08T00:33:26.957

The thing absolutely killing performance of your current query is the repeated scans of dbo.fruit_type_tbl. It is only small (1,047 rows) but the cumulative rows read from it is in excess of 25 billion due to the repetition.

The following will likely be a massive improvement as it will only scan it once.

SELECT *
FROM   (SELECT DISTINCT fruit_type
        FROM   dbo.fruit_type_tbl
        WHERE  fruit_type_group = 'TOT') ft
       RIGHT HASH JOIN mq_fruit mq
                    ON ft.fruit_type = mq.fruit_type
WHERE  mq.fruit_suffix IN ( 'A1', 'O1', 'S1' )
        OR ft.fruit_type IS NOT NULL

The query is never going to be that fast though as 17 million of the 35 million rows in mq_fruit do in fact meet one or the other condition so you are returning a lot of data to the client (especially with *).

The estimated row size is 362 bytes so that would be > 6GB if accurate. Your current query does have 108 seconds of ASYNC_NETWORK_IO waits that likely are largely attributable to this.

(I haven't bothered suggesting any indexing as fruit_type_tbl is small anyway and will only be scanned once and the large percentage of rows actually required from mq_fruit along with use of * and use of or means there is not a clear candidate index on that side.)

Improving and solving long running query

1 Answers1