-2

I have a job in Pentaho Data Integration that consists of a simple transformation involving a table input and a table output.

The query in the table input is as follows:

SELECT --TOP 1000
 IL.Invoice_Date
,MONTH(IL.Invoice_Date) AS Invoice_Month
,YEAR(IL.Invoice_Date) AS Invoice_Year
,IH.Invoice_Status
,IH.Invoice_Type
,IL.Invoice_Number
,IH.Invoice_Category
,IH.Invoice_Site
,IH.Source_Document
,IH.Source_Date
,IL.Customer_ID
,CT.Old_Customer_ID
,CT.Customer_Name
,CT.Customer_Country_ISO AS Customer_Country
,CT.Market
,CT.Escalao
,CT.Rating_B2B_B2C
,PR.Product_Reference
,PR.Product
,PR.Colour_ID
,PR.Colour
,PR.Size
,PR.Design
,PR.STK_Product_Family
,PR.FAM1
,PR.FAM2
,PR.FAM3
,PR.Product_Line
,OH.Order_Date
,YEAR(OH.Order_Date) AS Order_Year
,CASE
 WHEN OH.Order_Type IS NOT NULL THEN OH.Order_Type
 WHEN IL.Order_ID IS NOT NULL THEN LEFT(IL.Order_ID,3)
 ELSE NULL 
 END AS Order_Type
,IL.Order_ID
,CASE
 WHEN OH.Order_Number IS NOT NULL THEN OH.Order_Number
 WHEN IL.Order_ID IS NOT NULL AND IL.Order_ID <> '' THEN RIGHT(IL.Order_ID, LEN(IL.Order_ID) - 7)
 ELSE NULL 
 END AS Order_Number
,OH.Origin
,OH.Delivery_Country
,OH.Stock_Site AS Warehouse
,OH.Customer_Order_Ref
,IL.Invoiced_Quantity
,CAST(OL.Max_Price_A * COALESCE(ERH.Currency_Rate,1) AS DECIMAL(15,4)) AS Ref_Price_A
,IL.Unit_Net_Price
,IL.Unit_Gross_Price
,IL.Cost_Price
,IL.Margin
,(CASE WHEN IH.Invoice_Category = 'Credit Note' THEN -1 ELSE 1 END) * IL.Line_Net_Price AS Line_Net_Price
,IL.Line_Gross_Price
,IL.Line_Cost_Price
,IL.Currency

,((CASE WHEN IH.Invoice_Category = 'Credit Note' THEN -1 ELSE 1 END) * IL.Unit_Net_Price * COALESCE(ER.Currency_Rate,1)) AS Unit_Net_Price_EUR
,((CASE WHEN IH.Invoice_Category = 'Credit Note' THEN -1 ELSE 1 END) * IL.Unit_Gross_Price * COALESCE(ER.Currency_Rate,1)) AS Unit_Gross_Price_EUR
--,((CASE WHEN IH.Invoice_Category = 'Credit Note' THEN -1 ELSE 1 END) * IL.Unit_Gross_Price * COALESCE(ER.Currency_Rate,1)) AS Natural_Price
,((CASE WHEN IH.Invoice_Category = 'Credit Note' THEN -1 ELSE 1 END) * IL.Cost_Price * COALESCE(ER.Currency_Rate,1)) AS Cost_Price_EUR
,((CASE WHEN IH.Invoice_Category = 'Credit Note' THEN -1 ELSE 1 END) * IL.Margin * COALESCE(ER.Currency_Rate,1)) AS Margin_EUR
,((CASE WHEN IH.Invoice_Category = 'Credit Note' THEN -1 ELSE 1 END) * IL.Line_Net_Price * COALESCE(ER.Currency_Rate,1)) AS Line_Net_Price_EUR
,((CASE WHEN IH.Invoice_Category = 'Credit Note' THEN -1 ELSE 1 END) * IL.Line_Gross_Price * COALESCE(ER.Currency_Rate,1)) AS Line_Gross_Price_EUR
--,((CASE WHEN IH.Invoice_Category = 'Credit Note' THEN -1 ELSE 1 END) * IL.Line_Gross_Price * COALESCE(ER.Currency_Rate,1)) AS Natural_Line
,((CASE WHEN IH.Invoice_Category = 'Credit Note' THEN -1 ELSE 1 END) * IL.Line_Cost_Price * COALESCE(ER.Currency_Rate,1)) AS Line_Cost_Price_EUR

,CT.Special_Customer
,PR.Reference_Type
,CT.Area_Comercial
,CT.Market_Comercial
,CT.Zone_Comercial
,CT.Account_Manager
,CT.Sales_Manager
,CT.Sales_Director
,IL.[ICMS_Base]
,IL.[ICMS_Value]
,IL.[IPI_Base]
,IL.[IPI_Value]
,IL.[ICMS_ST_Base]
,IL.[ICMS_ST_Value]
,IL.[ICMS_ST_FinalValue]
,IL.Single_Identifier

FROM [BD_BI].[dwm].[Invoice_Line] AS IL
LEFT JOIN [BD_BI].[dwm].[Invoice_Header] AS IH ON IL.Invoice_Number = IH.Invoice_Number
LEFT JOIN [BD_BI].[dwm].[Customer] AS CT ON IL.Customer_ID = CT.Customer_ID
LEFT JOIN [BD_BI].[dwm].[Product] AS PR ON IL.Sage_Prod_ID = PR.Sage_Prod_ID
LEFT JOIN [BD_BI].[dwm].[Order_Header] AS OH ON IL.Order_ID = OH.Order_ID
LEFT JOIN [BD_BI].[dwm].[Exchange_Rate] AS ER ON IL.Invoice_Date >= ER.Start_Date AND IL.Invoice_Date <= ER.End_Date AND IL.Currency = ER.Source_Currency AND ER.Target_Currency = 'EUR'
LEFT JOIN [BD_BI].[dwm].[Exchange_Rate] AS ERH ON OH.Order_Date >= ERH.Start_Date AND OH.Order_Date <= ERH.End_Date AND OH.Currency = ERH.Source_Currency AND ERH.Target_Currency = 'EUR'
LEFT JOIN (
    SELECT
     Order_ID AS Join_Order_ID
    ,Sage_Prod_ID AS _Join_Sage_Prod_ID
    ,MAX(Ref_Price_A) AS Max_Price_A
    FROM [BD_BI].[dwm].[Order_Line]
    GROUP BY
     Order_ID
    ,Sage_Prod_ID
    ) AS OL ON IL.Order_ID = OL.Join_Order_ID AND IL.Sage_Prod_ID = OL._Join_Sage_Prod_ID

This query takes around 3 minutes to run in SQL Server Management Studio (SSMS), but the job in Pentaho takes around 2 hours to finish running. The table output step does a truncate table every time the job runs, and the query outputs 2,663,588 rows.

I have already attempted the following optimizations:

  1. Creating indexes, which made the query run in 4 minutes in SSMS, but with no noticeable improvement in Pentaho.
  2. Replacing the existing sub-query with a Common Table Expression (CTE), but this did not improve performance.
Tiago
  • 37
  • 6
  • 4
    I suggest you check the plans for differences and read [this article](https://www.sommarskog.se/query-plan-mysteries.html). – Dan Guzman Aug 07 '23 at 17:15
  • The problem could be in the extra time needed to bring all that data from the SQL Server machine to your local machine where you are running PDI. Or it could be a problem in the time needed to perform that query, I mean, I don't know how SSMS works, but maybe if your perform the query in SSMS, you are only get the first 100-1000-X rows, but if you force to really retrieve all those rows you might get similar results. – Ana GH Aug 08 '23 at 11:12
  • If the problem is on the time needed to bring the data from the SQL Server machine to your PDI machine, or the server where the Output Table is, you'll need perhaps to move the PDI installation to a server in the same network as the SQL Server. If you can't reduce the amount of data you need to move to another network, maybe extracting the data to a local file in the SQL Server, zipping it and then moving the file to your network to load to another DB could be quicker. – Ana GH Aug 08 '23 at 11:24
  • Or maybe you can modify your ETL process and instead of performing a whole data dump, you limit the amount of data to the data modified since the last run. You'll have a initial slow run, but the following amounts of data to process are smaller. Of course, this is only possible when the original data has been design to register modifications properly. – Ana GH Aug 08 '23 at 11:31

0 Answers0