I have a SQL Server query with an INNER JOIN between a large table containing 1 million rows (Acct) and a smaller table containing 10,000 rows (AcctTxns), but SQL Server produces an execution plan with wrong cardinality estimates.
I've boiled down the problem to the following statement:
SELECT p.AcctNo, p.Balance + t.TotalAmt as New Balance
FROM Acct p JOIN AcctTxns t
ON p.AcctNo = t.AcctNo
The Nested Loops operator shows an "Estimated Number of Rows" of 16.2588 vs "Actual Number of Rows" of 10000.
I am using MS SQL Server 2016 (13.0.1742.0).
I've tried a number of fixes, including:
- updating statistics
- using temp tables for intermediate results
- turning off the 2014 cardinality estimator
- rewriting the SQL statement a number of different ways (which led me to the heart of the problem as above)
but they don't fix the problem. The wrong estimate on the nested loops cascades to produce tempDB spills down the line, impacting performance.
Has anyone faced similar issues? Would appreciate any help to fix this problem. Thank you.
The following code sets up the problem:
--- [a] 1 million row Numbers table
DROP TABLE IF EXISTS #Numbers;
CREATE TABLE #Numbers (Number int PRIMARY KEY);
INSERT INTO #Numbers (Number)
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY (SELECT 1)) - 1
FROM sys.objects A CROSS JOIN sys.objects B
--- [b] Create Acct table and populate with 1 million accounts
DROP TABLE IF EXISTS dbo.Acct;
CREATE TABLE dbo.Acct (
PkID int not null IDENTITY(1,1),
AcctNo varchar(48) not null PRIMARY KEY,
Balance decimal(20,10) not null constraint DF_Balance default(0)
)
INSERT INTO dbo.Acct (AcctNo)
SELECT RIGHT( (REPLICATE('0',6) + CAST(number as varchar(6))), 6)
FROM #Numbers
ORDER BY Number
--- [c] Insert 10K transactions. Each Acct gets 2 txns
DROP TABLE IF EXISTS dbo.AcctTxns;
CREATE TABLE dbo.AcctTxns
(
PkID int not null IDENTITY(1,1),
AcctNo varchar(48) not null,
TxnID nvarchar(50) not null,
Amt decimal(20,10) not null,
TxnStatus nvarchar(10) not null,
LastBalance decimal(20,10) null
PRIMARY KEY (AcctNo, TxnID, TxnStatus)
)
DROP TABLE IF EXISTS #Acct_Inserted_3XB9F;
CREATE TABLE #Acct_Inserted_3XB9F
(
AcctNo varchar(48) not null PRIMARY KEY,
Balance decimal(20,10) null
)
declare @TxnCount int = 10000
; WITH Txns (RowNo, TxnID) AS (
SELECT Number, '#T9-' + RIGHT(REPLICATE('0',8) + CAST(Number as varchar(8)), 8)
FROM #Numbers WHERE Number BETWEEN 1 AND @TxnCount/2
UNION
SELECT Number, '#T9-' + RIGHT(REPLICATE('0',8) + CAST(Number as varchar(8)), 8)
FROM #Numbers WHERE Number BETWEEN @TxnCount/2+1 AND @TxnCount
)
INSERT INTO dbo.AcctTxns (AcctNo, TxnID, Amt, TxnStatus)
SELECT A.AcctNo, T.TxnID, 100, 'COMM'
FROM dbo.Acct A JOIN Txns T ON A.PkID = T.RowNo
--- [d] Update statistics
UPDATE STATISTICS dbo.Acct;
UPDATE STATISTICS dbo.AcctTxns;
--- [e] PROBLEM HERE ...
SET STATISTICS IO, XML ON;
SELECT TxnCount=COUNT(1)
FROM dbo.Acct A INNER JOIN dbo.AcctTxns T
ON A.AcctNo = T.AcctNo
SET STATISTICS IO, XML OFF;