1

We have a transact sql statement that queries 4 tables with millions of rows in each.

It takes several minutes, even though it has been optimized with indexes and statistics according to TuningAdvisor.

The structure of the query is like:

SELECT E.EmployeeName
    , SUM(M.Amount) AS TotalAmount
    , SUM(B.Amount) AS BudgetAmount
    , SUM(T.Hours) AS TotalHours
    , SUM(TB.Hours) AS BudgetHours
    , SUM(CASE WHEN T.Type = 'Waste' THEN T.Hours ELSE 0 END) AS WastedHours
FROM Employees E
LEFT JOIN MoneyTransactions M
    ON E.EmployeeID = M.EmployeeID
LEFT JOIN BudgetTransactions B
    ON E.EmployeeID = B.EmployeeID
LEFT JOIN TimeTransactions T
    ON E.EmployeeID = T.EmployeeID
LEFT JOIN TimeBudgetTransactions TB
    ON E.EmployeeID = TB.EmployeeID
GROUP BY E.EmployeeName

Since each transaction table contains millions of rows, I consider splitting it up into one query per transaction table, using table variables like @real, @budget, and @hours, and then joining these in a final SELECT. But in tests it seems to not speed up.

How would you deal with that in order to speed it up?

Milen A. Radev
  • 60,241
  • 22
  • 105
  • 110
Ole Lynge
  • 4,457
  • 8
  • 43
  • 57

3 Answers3

8

I'm not sure the query you posted will yield the results you're expecting.

It will cross join all the dimension tables (MoneyTransactions etc.) and multiply all the results.

Try this:

SELECT  E.EmployeeName,
        (
        SELECT  SUM(amount)
        FROM    MoneyTransactions m
        WHERE   M.EmployeeID = E.EmployeeID
        ) AS TotalAmount,
        (
        SELECT  SUM(amount)
        FROM    BudgetTransactions m
        WHERE   M.EmployeeID = E.EmployeeID
        ) AS BudgetAmount,
        (
        SELECT  SUM(hours)
        FROM    TimeTransactions m
        WHERE   M.EmployeeID = E.EmployeeID
        ) AS TotalHours,
        (
        SELECT  SUM(hours)
        FROM    TimeBudgetTransactions m
        WHERE   M.EmployeeID = E.EmployeeID
        ) AS BudgetHours
FROM    Employees E
Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • Hmmm... Isn't that SELECT EmployeeID, EmployeeName, SUM(...), SUM(...) FROM Employees GROUP BY EmployeeID, EmployeeName? – Tomalak Apr 07 '09 at 10:43
  • @Quassnoi: Thanks. I thought nested queries (SELECTs inside SELECT) would be slower than JOINs... Have not tried your suggestion yet... – Ole Lynge Apr 07 '09 at 12:41
  • No, they won't. Actually, the query you posted is slow because it makes lots of unnesessary joins and produces the incorrect results. If there are 100 rows per employee in each transaction table, you'll get 10,000,000 rows for each employee in a result, which is most probably not what you want. – Quassnoi Apr 07 '09 at 12:53
1

I don't know if you have all the indexes on your tables that will speed up things, but having big tables could have this impact on a query time. I would recommend partitioning the tables if possible. It is more work, but everything you do to speed up the query now it won't be enough after few millions new records.

Bojan Milenkoski
  • 501
  • 5
  • 13
0

Try this one:

SELECT E.EmployeeName, TA.TotalAmount, BA.BudgetAmount, TWH.TotalHours, BH.BudgetHours, TWH.WastedHours
FROM Employees E
LEFT JOIN 
(SELECT E.EmployeeID, SUM(M.Amount) AS TotalAmount
FROM Employees E INNER JOIN MoneyTransactions M ON E.EmployeeID = M.EmployeeID GROUP BY E.EmployeeID)TA
ON E.EmployeeID = TA.EmployeeID
LEFT JOIN 
(SELECT E.EmployeeID , SUM(B.Amount) AS BudgetAmount
FROM Employees E INNER JOIN BudgetTransactions B ON E.EmployeeID = B.EmployeeID GROUP BY E.EmployeeID)BA
ON E.EmployeeID = BA.EmployeeID
LEFT JOIN 
(SELECT E.EmployeeID , SUM(T.Hours) AS TotalHours , SUM(CASE WHEN T.Type = 'Waste' THEN T.Hours ELSE 0 END) AS WastedHours
FROM Employees E INNER JOIN TimeTransactions T ON E.EmployeeID = T.EmployeeID GROUP BY E.EmployeeID)TWH
ON E.EmployeeID = TWH.EmployeeID
LEFT JOIN 
(SELECT E.EmployeeID , SUM(TB.Hours) AS BudgetHours
FROM Employees E INNER JOIN TimeBudgetTransactions TB ON E.EmployeeID = TB.EmployeeID GROUP BY E.EmployeeID)BH
ON E.EmployeeID = BH.EmployeeID