Query optimization from from query execution plan

Question

I have not found any suitable way to show query plan other than image, so i added image. in image i got the execution plan and i want to reduce fullouter join cost

, if any one suggest me the way of reducing cost it would be great for better query plan link

WITH cte AS
(
SELECT
 coalesce(fact_connect_hours.dimProviderId,fact_connect_hour_hum_shifts.dimProviderId,fact_connect_hour_clock_times.dimProviderId)
  as dimProviderId,
  coalesce(fact_connect_hours.dimScribeId,fact_connect_hour_hum_shifts.dimScribeId,fact_connect_hour_clock_times.dimScribeId)
  as dimScribeId
 ,coalesce(fact_connect_hours.dimDateId,fact_connect_hour_hum_shifts.dimDateId,fact_connect_hour_clock_times.dimDateId)
  as dimDateId
,factConnectHourId
,totalProviderLogTime
,providerFirstJoinTime
,providerLastEndTime
,scribeFirstLogin
,scribeLastLogout
,totalScribeLogTime
, totalScopeTime
, totalStreamTime
, firstScopeJoinTime
, lastScopeEndTime
, scopeLastActivityTime
, firstStreamJoinTime
, lastStreamEndTime
, streamLastActivityTime

,fact_connect_hour_hum_shifts.shiftStartTime
,fact_connect_hour_hum_shifts.shiftEndTime
,fact_connect_hour_hum_shifts.totalShiftTime
,fact_connect_hour_clock_times.ClockStartTimestamp
,fact_connect_hour_clock_times.ClockEndTimestamp
,fact_connect_hour_clock_times.totalClockTime
,fact_connect_hour_hum_shifts.shiftTitle
,fact_connect_hours.dimStatusId
,dim_status.status
FROM fact_connect_hours
INNER JOIN  dim_status on fact_connect_hours.dimStatusId=dim_status.dimStatusId
  full outer JOIN fact_connect_hour_hum_shifts
 ON ( fact_connect_hour_hum_shifts.dimDateId=fact_connect_hours.dimDateId 
  and fact_connect_hour_hum_shifts.dimProviderId=fact_connect_hours.dimProviderId 
  and fact_connect_hour_hum_shifts.dimScribeId=fact_connect_hours.dimScribeId)
full outer join fact_connect_hour_clock_times
    on (fact_connect_hours.dimDateId = fact_connect_hour_clock_times.dimDateId 
        and fact_connect_hours.dimProviderId= fact_connect_hour_clock_times.dimProviderId 
        and fact_connect_hours.dimScribeId = fact_connect_hour_clock_times.dimScribeId
       )
WHERE coalesce(fact_connect_hours.dimDateId,fact_connect_hour_hum_shifts.dimDateId,fact_connect_hour_clock_times.dimDateId)>=732
) SELECT cte.*
  ,dim_date.tranDate
,dim_date.tranMonth
,dim_date.tranMonthName
,dim_date.tranYear
,dim_date.tranWeek
,dim_scribe.scribeUId
,dim_scribe.scribeFirstname
,dim_scribe.scribeFullname
,dim_scribe.scribeLastname
,dim_scribe.location
,dim_scribe.partner
,dim_scribe.beta
,dim_scribe.currentStatus
,dim_scribe.scribeEmail
,dim_scribe.augmedixEmail
,dim_scribe.partner
,dim_provider.scribeManager
,dim_provider.clinicalAccountManagerName
,dim_provider.providerUId
,dim_provider.beta
,dim_provider.accountName
,dim_provider.accountGroup
,dim_provider.accountType
,dim_provider.goLiveDate
,dim_provider.siteName
,dim_provider.churnDate
,dim_provider.providerFullname
,dim_provider.providerEmail

  from cte

   INNER JOIN   dim_date on cte.dimDateId=dim_date.dimDateId
   inner JOIN  aug_bi_dw.dbo.dim_provider AS dim_provider on cte.dimProviderId=dim_provider.dimProviderId 
   inner join aug_bi_dw.dbo.dim_scribe AS dim_scribe on cte.dimScribeId=dim_scribe.dimScribeId 

where dim_date.dimDateId>=732

You can paste the plan to Brent Ozar: http://pastetheplan.com — Nițu Alexandru, Mar 13 '19 at 11:54
*"I have not found any suitable way to show query plan other than image, so i added image"* SQL Server Management Studio can save a XML format of a plan it [seams](https://learn.microsoft.com/en-us/sql/relational-databases/performance/save-an-execution-plan-in-xml-format?view=sql-server-2017) — Raymond Nijland, Mar 13 '19 at 12:00
@RaymondNijland that is so long way coding xml thats why i had not added xml plan — Zaynul Abadin Tuhin, Mar 13 '19 at 12:02
Also you can use [SET SHOWPLAN_TEXT](https://learn.microsoft.com/en-us/sql/t-sql/statements/set-showplan-text-transact-sql?view=sql-server-2017) — Raymond Nijland, Mar 13 '19 at 12:04
Looks like the full outer joins are the costliest operations. Are you sure you absolutely need them? — Salman A, Mar 13 '19 at 12:13
also to add to @SalmanA 's the (full outer) joins in the CTE really doesn't have a filter so it will scan all data.. — Raymond Nijland, Mar 13 '19 at 12:13
Salman a yes I need full outer join and have to reduce that cost — Zaynul Abadin Tuhin, Mar 13 '19 at 12:14
You wrote `INNER JOIN dim_date on cte.dimDateId=dim_date.dimDateId` and `where dim_date.dimDateId>=732` which means you can also write and use that same filter conditon in the CTE maybe that helps to limit the costs without table structures it's hard to know for sure. — Raymond Nijland, Mar 13 '19 at 12:15
That means adding the filter in the CTE `fact_connect_hours.dimDateId >= 732 OR fact_connect_hour_hum_shifts.dimDateId >= 732 OR fact_connect_hour_clock_times.dimDateId >= 732`.. — Raymond Nijland, Mar 13 '19 at 12:21
Raymond main costly part is full outer join other than that filter optimization is known case so I can manage that part — Zaynul Abadin Tuhin, Mar 13 '19 at 12:26
*"Raymond main costly part is full outer join"* i know *"I need full outer join and have to reduce that cost"* Well a [Indexed View](https://learn.microsoft.com/en-us/sql/relational-databases/views/create-indexed-views?view=sql-server-2017) comes to mind, a [indexed view](https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2008/dd171921(v=sql.100)) (a materialized view) of those `full outer join`.should allow to run the query faster.. Notice the last link is mentioning ***Performance Gains from Indexed Views -> Tables can be prejoined and the resulting data set stored*** — Raymond Nijland, Mar 13 '19 at 12:47
You should get the actual execution plan (not estimated) and upload it to paste the plan. There is lots of useful info in the plan that can't be determined from the image — Martin Smith, Mar 13 '19 at 13:37
@MartinSmith https://www.brentozar.com/pastetheplan/?id=SJ3zttIvE — Zaynul Abadin Tuhin, Mar 13 '19 at 13:45
That's the estimated plan. Not actual. So any potential issues with cardinality estimates and hash spills aren't shown to us. But from a quick glance given the semantics of your query that plan is quite reasonable. It has to full outer join those tables in their entirety before it can apply the predicate and the filter is pushed down as close as possible to that. What is the problem you are trying to solve? Execution time? If so how long does it currently take to run and what is the target? — Martin Smith, Mar 13 '19 at 14:07
The table cardinalities shown in the plan are pretty small for a DW (`3,277`, `3,653`, `1,183`, `146,919`, `105,651`, `6`, `442,252`). Are these production volumes or test data? — Martin Smith, Mar 13 '19 at 14:09
@MartinSmith this query is taking 4 minute where record set is only 5lac we read data from aws sql server instance target is less than 1minute — Zaynul Abadin Tuhin, Mar 13 '19 at 14:23
Does the actual plan show any spills or big cardinality mis estimates? What wait types are shown for that session when it runs? Is it being heavily resource governed by AWS? — Martin Smith, Mar 13 '19 at 14:32

score 2 · Answer 1 · answered Mar 13 '19 at 12:32

2

Based on the table names (dim* and fact*), I'll assume you are doing a report of sorts over a data warehouse schema. Assuming this is the case, then likely the best thing you can do to improve performance is to consider using Columnstore indexes (and batch mode execution which is implicit once you enable Columnstores). These indexes are heavily compressed and often give significant performance gains on IO-bound workloads. Fact tables are the usual candidates as they are largest/often don't fit in the buffer pool.

Columnstores are supported in all editions in SQL 2016 onwards and go faster in Enterprise Edition (more parallelism, faster operations internally like using SIMD instructions, etc.). Please note that they don't directly support primary keys, so this may impact how you lay out the tables a bit. You can create keys (as b-tree secondary indexes internally), so some of the space savings are lost if you use primary keys. Often, fact tables + columnstores also use partitioning to get another layer of filtering without secondary indexes.

Please consider trying your query again with columnstores replacing the fact tables (perhaps on a copy of your database to do an experiment). When you look at the query plans that result, I suggest that you also look to see if the operators are running in batch mode. Batch mode operators are different than their row mode counterparts. The batch mode ones are optimized for the architectures of modern CPUs to minimize the amount of memory traffic in and out of the CPU. As a rough rule-of-thumb, 10x-100x difference is possible with columnstores + batch mode.

answered Mar 13 '19 at 12:32

Conor Cunningham MSFT

4,151
1
15
21

how it is the answer to this question – Zaynul Abadin Tuhin Mar 13 '19 at 13:07
1

@ZaynulAbadinTuhin - you asked how to reduce the cost. This answer advises that using columnstore will reduce the cost significantly. How is it not an answer to the question? – Martin Smith Mar 13 '19 at 13:31
@MartinSmith i want to improve my query and i mentioned that i need improvement of full outer join cost that's why i said how it is the answer yes cloumnstoes is alaws good for reading but i am finding improvement of my query – Zaynul Abadin Tuhin Mar 13 '19 at 13:55
if you have a data warehouse on a recent version of SQL, the best way to improve performance is to use columnstores + batch mode. The costs are just numbers to trade off different choices in the optimizer's search space. Using the row-mode operators instead of the batch-mode ones for this much data is just moving deck chairs around on the Titanic when you can use columnstores. The full outer join will be cheaper with columnstores almost certainly – Conor Cunningham MSFT Mar 13 '19 at 23:23
@ConorCunninghamMSFT lets forgot about column store index without that do you have any idea for improvement or is any significant area in my query that should i need to update – Zaynul Abadin Tuhin Mar 15 '19 at 08:17
I don't have any significant suggestions since I can't really understand the semantics of what you are asking of the database, sorry – Conor Cunningham MSFT Mar 15 '19 at 16:29

score 0 · Answer 2 · answered Mar 14 '19 at 12:57

The only filter that can help you is 'where dim_date.dimDateId>=X' And this comes up with a join to the cte and the cte field is composed out of 3 tables outer join themselves. For best performance i would choose to tell sql what to do step by step else it is very risky to perform with the best plan as is:

use 3 statements at tables fact_connect_hours, fact_connect_hour_hum_shifts and fact_connect_hour_clock_times using the filter and get results (just primary keys or all columns needed) into 3 temps like #fact_connect_hours, #fact_connect_hour_hum_shifts and #fact_connect_hour_clock_times
Use the statement as is but replace with the temps or use temps join real tables if temps only have the PK
Add indexes (if not present already) to columns fact_connect_hours.dimDateId, fact_connect_hour_hum_shifts.dimDateId and fact_connect_hour_clock_times.dimDateId

This way you make sure you filter directly what you need at the simplest steps possible, then the complicated query will work at a preset number of rows thus performance is guaranteed as very good vs very bad plan applied on a few rows is practically non important.

Lesser detail: Pay attention to 'INNER JOIN dim_status' - if there is no FK constraint the cardinality estimator may miss estimated returned rows as been unable to understand the relation between the tables.

I can also see an attempt for optimization as the filter had ascend upwards into the cte. This is a similar plan to what i propose at a lesser restriction. Using my plan will enforce rows searches to the core-root source.

Query optimization from from query execution plan

2 Answers2