0

Question I've got a Redshift table which contains hierarchal data by month. I am trying to calculate sum for manager based on people under them but struggling to get the SQL working. Actual data contains multiple levels.

Example Sum of Bob's 'Person_actual' is some of all 'Person_actual' column for all the people reporting into him. Reading online I can see i've to use recursive query but can't get it working. Any guidance

Sample Data | Person_owner_id | Person_id | Person_target | Person_actual | Date_description |-----------------|-----------|---------------|---------------|----------------- | |Bob | | |Jan-23
| |Bob | | |Feb-23 | |Bob | | |Mar-23
| |Bob | | |Apr-23
| |Bob | | |May-23
| |Bob | | |Jun-23
| |Bob | | |Jul-23
| |Bob | | |Aug-23
| |Bob | | |Sep-23
| |Bob | | |Oct-23
| |Bob | | |Nov-23
| |Bob | | |Dec-23
|Bob |Chris |10 |9 |Jan-23
|Bob |Chris |10 |9 |Feb-23
|Bob |Chris |10 |7 |Mar-23
|Bob |Chris |10 |5 |Apr-23
|Bob |Chris |10 |6 |May-23
|Bob |Chris |10 |3 |Jun-23
|Bob |Chris |10 |2 |Jul-23
|Bob |Chris |10 | |Aug-23
|Bob |Chris |10 | |Sep-23
|Bob |Chris |10 | |Oct-23
|Bob |Chris |10 | |Nov-23 |Bob |Chris |10 | |Dec-23
|Bob |Ali |10 |9 |Jan-23
|Bob |Ali |10 |9 |Feb-23
|Bob |Ali |10 |7 |Mar-23
|Bob |Ali |10 |5 |Apr-23
|Bob |Ali |10 |6 |May-23
|Bob |Ali |10 |3 |Jun-23
|Bob |Ali |10 |2 |Jul-23
|Bob |Ali |10 | |Aug-23
|Bob |Ali |10 | |Sep-23
|Bob |Ali |10 | |Oct-23
|Bob |Ali |10 | |Nov-23 |Bob |Ali |10 | |Dec-23

Expected Output | Person_owner_id | Person_id | Person_target | Person_actual | Date_description |-----------------|-----------|---------------|---------------|----------------- | |Bob | |18 |Jan-23
| |Bob | |19 |Feb-23 | |Bob | |8 |Mar-23
| |Bob | |10 |Apr-23
| |Bob | |12 |May-23
| |Bob | |6 |Jun-23
| |Bob | |4 |Jul-23
| |Bob | | |Aug-23
| |Bob | | |Sep-23
| |Bob | | |Oct-23
| |Bob | | |Nov-23
| |Bob | | |Dec-23
|Bob |Chris |10 |9 |Jan-23
|Bob |Chris |10 |9 |Feb-23
|Bob |Chris |10 |7 |Mar-23
|Bob |Chris |10 |5 |Apr-23
|Bob |Chris |10 |6 |May-23
|Bob |Chris |10 |3 |Jun-23
|Bob |Chris |10 |2 |Jul-23
|Bob |Chris |10 | |Aug-23
|Bob |Chris |10 | |Sep-23
|Bob |Chris |10 | |Oct-23
|Bob |Chris |10 | |Nov-23 |Bob |Chris |10 | |Dec-23
|Bob |Ali |10 |9 |Jan-23
|Bob |Ali |10 |10 |Feb-23
|Bob |Ali |10 |1 |Mar-23
|Bob |Ali |10 |5 |Apr-23
|Bob |Ali |10 |6 |May-23
|Bob |Ali |10 |3 |Jun-23
|Bob |Ali |10 |2 |Jul-23
|Bob |Ali |10 | |Aug-23
|Bob |Ali |10 | |Sep-23
|Bob |Ali |10 | |Oct-23
|Bob |Ali |10 | |Nov-23 |Bob |Ali |10 | |Dec-23

AIViz
  • 82
  • 9

2 Answers2

0
--- Table definition
CREATE TABLE rstable (
  Person_owner_id TEXT,
  Person_id TEXT,
  Person_target INTEGER,
  Person_actual INTEGER,
  Date_description TEXT
);

--- Recreating data as insert statement
INSERT INTO rstable (Person_owner_id, Person_id, Person_target, Person_actual, Date_description) VALUES
('Bob', NULL, 18, 0, 'Jan-23'),
('Bob', NULL, 19, 0, 'Feb-23'),
('Bob', NULL, 8, 0, 'Mar-23'),
('Bob', NULL, 10, 0, 'Apr-23'),
('Bob', NULL, 12, 0, 'May-23'),
('Bob', NULL, 6, 0, 'Jun-23'),
('Bob', NULL, 4, 0, 'Jul-23'),
('Bob', NULL, NULL, 0, 'Aug-23'),
('Bob', NULL, NULL, 0, 'Sep-23'),
('Bob', NULL, NULL, 0, 'Oct-23'),
('Bob', NULL, NULL, 0, 'Nov-23'),
('Bob', NULL, NULL, 0, 'Dec-23'),
('Bob', 'Chris', 10, 9, 'Jan-23'),
('Bob', 'Chris', 10, 9, 'Feb-23'),
('Bob', 'Chris', 10, 7, 'Mar-23'),
('Bob', 'Chris', 10, 5, 'Apr-23'),
('Bob', 'Chris', 10, 6, 'May-23'),
('Bob', 'Chris', 10, 3, 'Jun-23'),
('Bob', 'Chris', 10, 2, 'Jul-23'),
('Bob', 'Chris', 10,NULL,'Aug-23'),
('Bob', 'Chris', 10,NULL,'Sep-23'),
('Bob', 'Chris', 10,NULL,'Oct-23'),
('Bob', 'Chris', 10,NULL,'Nov-23'),
('Bob', 'Chris', 10,NULL,'Dec-23'),
('Bob','Ali' ,10 ,9 ,'Jan-23' ),
('Bob','Ali' ,10 ,9 ,'Feb-23' ),
('Bob','Ali' ,10 ,7 ,'Mar-23' ),
('Bob','Ali' ,10 ,5 ,'Apr-23' ),
('Bob','Ali' ,10 ,6 ,'May-23' ),
('Bob','Ali' ,10 ,3 ,'Jun-23' ),
('Bob','Ali' ,10 ,2 ,'Jul-23' ),
('Bob','Ali' ,10,NULL,'Aug-23' ),
('Bob','Ali' ,10,NULL,'Sep-23' ),
('Bob','Ali' ,10,NULL,'Oct-23' ),
('Bob','Ali' ,10,NULL,'Nov-23' ),
('Bob','Ali' ,10,NULL,'Dec-23');

The query

WITH RECURSIVE t(
  Person_owner_id,
  Person_id,
  Person_target,
  Person_actual,
  Date_description,
  SUM_ACTUAL
) AS (
  SELECT
    Person_owner_id,
    Person_id,
    Person_target,
    Person_actual,
    Date_description,
    SUM(Person_actual) OVER (
      PARTITION BY Person_owner_id
      ORDER BY Date_description
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS SUM_ACTUAL
  FROM
    (
      SELECT
        Person_owner_id,
        Person_id,
        Person_target,
        Person_actual,
        Date_description
      FROM
        rstable
    ) t
  WHERE
    Person_owner_id IS NOT NULL
)
SELECT
  t.Person_owner_id,
  t.Person_id,
  t.Person_target,
  t.Person_actual,
  t.Date_description,
  t.SUM_ACTUAL
FROM
  t;

We first create a recursive CTE called t. It starts with the rows where the Person_owner_id is NULL. These rows represent the top-level managers. The query then calculates the cumulative sum of the Person_actual column for each manager and their subordinates. The query continues to iterate through the t CTE until there are no more rows and here's the output Query result

PGzlan
  • 81
  • 4
  • Hi @PGzlan, thanks for the answer. However, it is not what I am after. As it shows in your output, you are summing Ali based on Chris as well. In our output, Ali & Chris should remain as it is as they do not have any subordinates but Bob should have a sum based on Ali & Chris. Example (for sum_actual): Bob | Jan 23 | 18 Ali | Jan 23 | 9 Chris Jan 23 | 9 Bob | Feb 23 | 18 Ali | Feb 23 | 9 Chris | Feb 23 | 9 Logic: For each month, sum the actual value for all the subordinates just for that month – AIViz Aug 21 '23 at 11:20
0

I was able to get the results I wanted with following query:

SELECT
    t1.person_owner_id,
    t1.date_from,
    COALESCE(SUM(t2.person_actual), 0) AS sum_person_actual
FROM (
    SELECT DISTINCT person_owner_id, date_from
    FROM staging.rstable
) AS t1
CROSS JOIN (
    SELECT DISTINCT person_id
    FROM staging.rstable
    WHERE person_id IS NOT NULL
) AS subordinates
LEFT JOIN staging.rstable AS t2
    ON t1.person_owner_id = t2.person_owner_id
    AND t1.date_from = t2.date_from
    AND t2.person_id = subordinates.person_id
GROUP BY t1.person_owner_id, t1.date_from
ORDER BY t1.person_owner_id, t1.date_from;
AIViz
  • 82
  • 9