0

I am new to pentaho. Basically i have a requirement in hierarchy structured data where i select a sub set of hierarchy and i need to do count / sum on lower level nodes then go up level by level doing count + previous sum of count etc. This is all i know how to do sql using table variable (main point here is that i need to do this in memory and not do any updates in data table.)

DECLARE @ConsolidateMaster TABLE
(
-- my fields
)

INSERT @ConsolidateMaster
--select some sub set data and inset into table variable


UPDATE A
SET A.[DIR_REPORT_CNT] = 0,
    A.[DIR_REPORT_COST] = 0,
    A.[TOTAL_REPORT_CNT] = 0,
    A.[TOTAL_REPORT_COST] = 0
FROM @ConsolidateMaster A
LEFT JOIN @ConsolidateMaster B
ON A.EMP_ID = B.SUPERVISOR_ID
WHERE B.EMP_ID IS NULL AND A.PROJ_CD = @PROJ_CD

--GO UP LEVEL BY LEVEL AND START COUNTING THE TOTAL + COST
DECLARE @MAX_LEVEL INT = (SELECT MAX(SUPERVISOR_LAYER) FROM @ConsolidateMaster WHERE PROJ_CD = @PROJ_CD)

WHILE (SELECT COUNT(1) FROM @ConsolidateMaster WHERE [DIR_REPORT_CNT] IS NULL) > 0
BEGIN
    UPDATE 
        A
    SET 
        A.DIR_REPORT_CNT = B.DIR_REPORT_CNT,
        A.DIR_REPORT_COST = B.DIR_REPORT_COST,
        A.TOTAL_REPORT_CNT = B.TOTAL_REPORT_CNT,
        A.TOTAL_REPORT_COST = B.TOTAL_REPORT_COST
    FROM @ConsolidateMaster A
    INNER JOIN 
    (
    SELECT 
        C.SUPERVISOR_ID,
        COUNT(1) DIR_REPORT_CNT,
        SUM(ISNULL(ANNUAL_COST,0)) DIR_REPORT_COST,
        SUM(C.[TOTAL_REPORT_CNT]) + COUNT(1) TOTAL_REPORT_CNT,
        SUM(C.TOTAL_REPORT_COST) + SUM(ISNULL(ANNUAL_COST,0)) TOTAL_REPORT_COST
    FROM @ConsolidateMaster C
    WHERE 
        C.SUPERVISOR_LAYER = @MAX_LEVEL 
        AND C.PROJ_CD = @PROJ_CD
    GROUP BY 
        C.SUPERVISOR_ID
    ) B
    ON A.EMP_ID = B.SUPERVISOR_ID
    WHERE A.PROJ_CD = @PROJ_CD 
          AND A.SUPERVISOR_LAYER < @MAX_LEVEL

    SET @MAX_LEVEL = @MAX_LEVEL - 1
END

After doing updates on the temp variable in the sql i would just select * from @ConsolidateMaster . How can i implement similar functionality in pentaho.

  • If you have it all done and ready in SQL, why do you still need PDI? If you describe your intended solution in more detail, it will be easier for people to give an answer that's more specific than "use Group By". – Cyrus May 15 '17 at 16:45
  • @Cyrus well that is what i told my superior but they told me we are using JDBC & Pentaho and currently have no store procedure in db. (Basically by architecture) . I don't need specific solution i just need hints on how to replicate table variable functionality in pentaho. (Do row update operations in memory. ) – Krunal Parekh May 16 '17 at 01:58

1 Answers1

1

You shouldn't ask how to mimic a Table datatype in Kettle. You found a solution to your problem with T-SQL, I understand, but T-SQL and Kettle don't work at the same level of abstraction. Forget about your solution for a while and concentrate on your problem: "How to extract hierarchical data stored as an adjacency list in a database (SQL Server) and aggregate (count, sum) certain columns using a dataflow tool (Kettle)?".

There's a step named Closure Generator that's able to calculate the transitive closure of your relation ConsolidateMaster and emit tuples [supervisor_id, emp_id, distance]. Merge-Join, Sort Rows, and Group By should produce the results you're looking for.

marabu
  • 1,166
  • 7
  • 9