4

I have a SQL Server query with an INNER JOIN between a large table containing 1 million rows (Acct) and a smaller table containing 10,000 rows (AcctTxns), but SQL Server produces an execution plan with wrong cardinality estimates.

I've boiled down the problem to the following statement:

SELECT p.AcctNo, p.Balance + t.TotalAmt as New Balance
    FROM Acct p JOIN AcctTxns t 
    ON p.AcctNo = t.AcctNo

Image - Nested Loops operator with wrong Estimated Number of Rows

Full Execution Plan

The Nested Loops operator shows an "Estimated Number of Rows" of 16.2588 vs "Actual Number of Rows" of 10000.

I am using MS SQL Server 2016 (13.0.1742.0).

I've tried a number of fixes, including:

  1. updating statistics
  2. using temp tables for intermediate results
  3. turning off the 2014 cardinality estimator
  4. rewriting the SQL statement a number of different ways (which led me to the heart of the problem as above)

but they don't fix the problem. The wrong estimate on the nested loops cascades to produce tempDB spills down the line, impacting performance.

Has anyone faced similar issues? Would appreciate any help to fix this problem. Thank you.

The following code sets up the problem:

--- [a] 1 million row Numbers table
DROP TABLE IF EXISTS #Numbers;
CREATE TABLE #Numbers (Number int PRIMARY KEY);
INSERT INTO #Numbers (Number) 
 SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY (SELECT 1)) - 1 
 FROM sys.objects A CROSS JOIN sys.objects B 

--- [b] Create Acct table and populate with 1 million accounts
DROP TABLE IF EXISTS dbo.Acct;
CREATE TABLE dbo.Acct (
  PkID  int             not null IDENTITY(1,1),
  AcctNo    varchar(48)     not null PRIMARY KEY,
  Balance   decimal(20,10)  not null constraint DF_Balance default(0)
)
INSERT INTO dbo.Acct (AcctNo) 
 SELECT RIGHT( (REPLICATE('0',6) + CAST(number as varchar(6))), 6)
 FROM #Numbers
 ORDER BY Number

--- [c] Insert 10K transactions. Each Acct gets 2 txns
DROP TABLE IF EXISTS dbo.AcctTxns;
CREATE TABLE dbo.AcctTxns
(
  PkID      int                not null IDENTITY(1,1),
  AcctNo        varchar(48)    not null,
  TxnID     nvarchar(50)       not null,    
  Amt           decimal(20,10) not null,
  TxnStatus nvarchar(10)       not null,
  LastBalance   decimal(20,10) null
  PRIMARY KEY (AcctNo, TxnID, TxnStatus)
)
DROP TABLE IF EXISTS #Acct_Inserted_3XB9F;
CREATE TABLE #Acct_Inserted_3XB9F
(
  AcctNo        varchar(48)         not null PRIMARY KEY,   
  Balance       decimal(20,10)      null
)

declare @TxnCount int = 10000
; WITH Txns (RowNo, TxnID) AS (
   SELECT Number, '#T9-' + RIGHT(REPLICATE('0',8) + CAST(Number as varchar(8)), 8)
   FROM #Numbers WHERE Number BETWEEN 1 AND @TxnCount/2
  UNION
   SELECT Number, '#T9-' + RIGHT(REPLICATE('0',8) + CAST(Number as varchar(8)), 8)
   FROM #Numbers WHERE Number BETWEEN @TxnCount/2+1 AND @TxnCount
)
INSERT INTO dbo.AcctTxns (AcctNo, TxnID, Amt, TxnStatus)
  SELECT A.AcctNo, T.TxnID, 100, 'COMM'
  FROM dbo.Acct A JOIN Txns T ON A.PkID = T.RowNo

--- [d] Update statistics
UPDATE STATISTICS dbo.Acct;
UPDATE STATISTICS dbo.AcctTxns;

--- [e] PROBLEM HERE ...
SET STATISTICS IO, XML ON;
SELECT TxnCount=COUNT(1) 
FROM dbo.Acct A INNER JOIN dbo.AcctTxns T 
ON A.AcctNo = T.AcctNo
SET STATISTICS IO, XML OFF;
Gerardo Grignoli
  • 14,058
  • 7
  • 57
  • 68
Y-Mi Wong
  • 86
  • 1
  • 5
  • Try to update statistics with full scan and run your query once. And also you can try with option(Recompile) hint. – Tharunkumar Reddy Sep 20 '18 at 08:05
  • Pictures of query plans don't often help, you would be better off using [Paste the Plan](https://www.brentozar.com/pastetheplan/). – Thom A Sep 20 '18 at 08:06
  • Execution plan was already pasted at suggested location.... (2nd link). Nicely prepared question IMHO. – Paul Maxwell Sep 20 '18 at 08:07
  • As a sidenote: The union may be intential for deduplication. If it's not and you don't care, use union all for performance improvement. – SQL_M Sep 20 '18 at 08:20
  • Thanks Tharunkumar. I tried OPTION (RECOMPILE) before but it didn't work. Same result. I'd be hesitant to use this hint in production as it'll degrade peformance for other queries - this is an OLTP table – Y-Mi Wong Sep 20 '18 at 08:40
  • Yes, have the full execution plan on Brent's "Paste the Plan" already. Thx – Y-Mi Wong Sep 20 '18 at 08:41
  • Noted on the UNION. But that was just quick code to setup the environment. The main issue is on line "[e] PROBLEM HERE" – Y-Mi Wong Sep 20 '18 at 08:42
  • Joining on varchar(48) is a bad idea. I doubt server can make any reasonably usable statistics on this. – Antonín Lejsek Sep 21 '18 at 01:17
  • @Y-Mi Wong Note, OPTION (RECOMPILE) only affects the single query it's applied to, so it won't affect other queries performances in your production system. – J.D. Oct 31 '20 at 03:52
  • Also, just curious what happens if you apply the WITH (FORCESEEK) query hint to the AcctTxns table in the query? – J.D. Oct 31 '20 at 03:56
  • Tried the OPTION (RECOMPILE) too. But one of my team members changed the "AcctNo" column from varchar to int and it improved performance. – Y-Mi Wong Aug 21 '21 at 03:18

1 Answers1

0

It seems you're missing nonclustered indexes on the two tables, on the columns you're joining.

CREATE NONCLUSTERED INDEX NC_AcctNo on Acct(AcctNo) INCLUDE (Balance);
CREATE NONCLUSTERED INDEX NC_AcctNo on Acctxns(AcctNo) INCLUDE (TotalAmt);

You should have better estimates with your query, but if you're not filtering the data from both tables with a WHERE clause, then you'll get index scans instead of clustered index scans, which are slightly better in terms of performance if you use the above indexes.

But they still can take up some time and resources, depending on the number of rows you need returned.

Aditionally you could have a look at this answer from Paul White on operator estimates and the other answers to the question as well.

Radu Gheorghiu
  • 20,049
  • 16
  • 72
  • 107
  • Thanks, but the indexes aren't the issue. The real code has those covering indexes you suggested. The issue is the INNER JOIN that translates into a "Nested Loops" operator where the "Estimated Number of Rows" is way off from the input numbers coming in from the preceding index scan/seek nodes. Note that these input nodes each have the correct "Estimated Number of Rows" of 10000. – Y-Mi Wong Sep 20 '18 at 08:44
  • @Y-MiWong I didn't have the chance to test this, but what about your PK? I'd try changing my PK for the AcctTxns table from the default (and silent) CLUSTERED to NONCLUSTERED and then create a clustering key only over the AcctNo column. Maybe that will give you better estimates and maybe give you the `MERGE` / `HASH` join you want? – Radu Gheorghiu Sep 20 '18 at 08:45
  • Also tried that. In the real code, I the Acct and AcctTxns tables are both temp tables, which I populated via an ETL process. I tried populating these temp tables first, then creating separate non-clustered indexes with the necessary covering indexes. I even tried updating statistics explicitly after these index creations. No luck. I've also tested the scenario with real tables (above code) and problem still occurs – Y-Mi Wong Sep 20 '18 at 08:50