-1

I am trying to improve performance of a complex stored procedure. The bottleneck is this giant query with over 30 joins (most of them left) and a ridiculous amount of criteria in the WHERE statement. In addition, the SELECT portion contains calls to multiple functions that do significant work as well.

Pseudo Example:

select fn_DoWork1(caseID, orderID) as cln1,
       fn_DoWork2(caseID, orderID) as cln2,
       ... 20 other columns
       BalanceDue
from tbl1
   left join tbl2 on ...
   ...
   left join tbl30 on ...
where 
   (tbl10.ArrivalDate between @foo1 and @foo2)
   ...
   (@prmProcessingID = 0 OR tbl10.ProcessingID = @prmProcessingID)

You get the idea. I broke up the query into smaller portions, depositing data into temp tables along the way, so that each successive query has to work on a smaller subset. But at the end of the day, it still matches thousands of records, so performance didn't improve as much as I hoped.

My next thought was to only return 1 page of records at a time (20 records per page), since that is all the user needs to see. So I added OFFSET x ROWS and FETCH NEXT 20 ROWS ONLY and that fixed the performance problem.

However, the problem now is that I return 20 rows, but I have no idea how many rows match the criteria in total. And thus, I can't tell the user how many pages of data there are - e.g. I can't render the pager on the UI properly.

Is there a better way to approach this problem?

AngryHacker
  • 59,598
  • 102
  • 325
  • 594
  • 2
    That final clause suggests you have a catch-all/kitchen sink query; do you have a `RECOMPILE` in the definition anywhere? Otherwise you're likely suffering (severe) parameter caching issues. – Thom A Jul 06 '23 at 21:31
  • 2
    [Old but worth a look](https://www.sqlservercentral.com/articles/optimising-server-side-paging-part-ii) if you haven't read it specifically for your question issue, everything by PW is worth a read! – Stu Jul 06 '23 at 21:31
  • @ThomA Yes, `RECOMPILE` is definitely something I will address. But 99% of the problem is the query. – AngryHacker Jul 06 '23 at 22:47
  • 1
    Consider not using Rowset pagination in the first place. Keyset pagination is a much better solution to the whole paging problem, but it requires a rethink of the whole application design, all the way to the user. See [Is there any better option to apply pagination without applying OFFSET in SQL Server?](https://stackoverflow.com/a/70520457/14868997) Furthermore, scalar functions are a known performance problem, remove them wherever possible, and maybe replace with Inline table Valued Functions. Remove any join that doesn't need to be there, as well as any `select` column. – Charlieface Jul 07 '23 at 00:52
  • 1
    one way to improve the query is to only fetch the IDs of all the rows, for counting purposes. This avoids expensive functions and perhaps some of the joins. Then when you got your IDs, you can do the paging logic and remove those not needed. Then you do the real query where you join all the data and do all the function calls – siggemannen Jul 07 '23 at 08:16
  • @siggemannen I've done that to the extent that I can. The main problem with this approach is that there are lots of criteria in the WHERE statement which necessitate bringing in tables in the FROM/JOIN portion. But yes, that approach and bringing one page back at a time knocked down the time from 2500ms to 750ms. The goal is to be under 200ms. – AngryHacker Jul 07 '23 at 17:28
  • For rest you need to include your query plan me thinks – siggemannen Jul 07 '23 at 18:07

2 Answers2

1

Is there a better way to approach this problem?

Counting the rows requires evaluating the JOINs and WHERE clause over the whole dataset, so it is almost expensive as running the query and storing the results.

Your choices are to run the full query and cache the results, or adopt a UX that doesn't display the total number of pages.

David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67
-3

Try breaking the query into multiple WITH statements and deferring the function calls to the last possible query as a way to simplify the structure of the query. Logically, they would be same as one big query.

David Browne is correct, if you do not need to display a row count in the front-end UI, don't compute it.

We did all of the WITH statements in this order

WITH customerData AS
(
   SELECT customer data fields used and join in lookup data WHERE filter in only active customers
),
purchaseOrders as
(
  SELECT purchase order fields where customer Id in CustomerData rows and date range in A to B
),
...

then the final query joins in the right With statements and calls the functions, case statements, etc.

The nice thing about it is you can slowly refactor the original query into a series of simpler queries.

Dale K
  • 25,246
  • 15
  • 42
  • 71
snj
  • 72
  • 8
  • These are all suggestions, there are too many variables to consider from such a short question. – snj Jul 07 '23 at 00:04
  • 3
    `WITH` statement? More commonly known as a "Common Table Expression" or CTE. However CTEs do not improve performance, in fact they often make performance worse. Actually splitting a query into multiple temp tables can often help performance. – Dale K Jul 07 '23 at 00:59