-1

I have a table in amazon redshift (datagrip) like so:

Contract_ID Starting_Month Contract_Duration_In_Months Collection_Due_Date Target Amount_Collected
10001 01/01/2022 12 01/01/2022 10000 40000
10001 01/01/2022 12 01/02/2022 10000 0
10001 01/01/2022 12 01/03/2022 10000 0
10001 01/01/2022 12 01/04/2022 10000 0
10001 01/01/2022 12 01/05/2022 10000 0
10001 01/01/2022 12 01/06/2022 10000 0
10001 01/01/2022 12 01/07/2022 10000 0
10001 01/01/2022 12 01/08/2022 10000 30000
10001 01/01/2022 12 01/09/2022 10000 2500
10001 01/01/2022 12 01/10/2022 10000 0
10001 01/01/2022 12 01/11/2022 10000 0
10001 01/01/2022 12 01/12/2022 10000 0
10002 01/01/2022 8 01/03/2022 5000 12000
10002 01/01/2022 8 01/04/2022 5000 1000
10002 01/01/2022 8 01/05/2022 5000 0
10002 01/01/2022 8 01/06/2022 5000 0
10002 01/01/2022 8 01/07/2022 5000 10000
10002 01/01/2022 8 01/08/2022 5000 0
10002 01/01/2022 8 01/09/2022 5000 0
10002 01/01/2022 8 01/10/2022 5000 0

Whereby each month I have a target amount and a payment (amount_collected), and each month I want to see how much of my target I've achieved (the achieved amount can never be higher than the target), if the payment exceeds the target then the amount achieved will trickle down to the next month. Like so:

Contract_ID Starting_Month Contract_Duration_In_Months Collection_Due_Date Target Amount_Collected Achieved Overpayment
10001 01/01/2022 12 01/01/2022 10000 40000 10000 30000
10001 01/01/2022 12 01/02/2022 10000 0 10000 20000
10001 01/01/2022 12 01/03/2022 10000 0 10000 10000
10001 01/01/2022 12 01/04/2022 10000 0 10000 0
10001 01/01/2022 12 01/05/2022 10000 0 0 0
10001 01/01/2022 12 01/06/2022 10000 0 0 0
10001 01/01/2022 12 01/07/2022 10000 0 0 0
10001 01/01/2022 12 01/08/2022 10000 30000 10000 20000
10001 01/01/2022 12 01/09/2022 10000 2500 10000 12500
10001 01/01/2022 12 01/10/2022 10000 0 10000 2500
10001 01/01/2022 12 01/11/2022 10000 0 2500 0
10001 01/01/2022 12 01/12/2022 10000 0 0 0
10002 01/01/2022 8 01/03/2022 5000 12000 5000 7000
10002 01/01/2022 8 01/04/2022 5000 1000 5000 3000
10002 01/01/2022 8 01/05/2022 5000 0 3000 0
10002 01/01/2022 8 01/06/2022 5000 0 0 0
10002 01/01/2022 8 01/07/2022 5000 10000 5000 5000
10002 01/01/2022 8 01/08/2022 5000 0 5000 0
10002 01/01/2022 8 01/09/2022 5000 0 0 0
10002 01/01/2022 8 01/10/2022 5000 0 0 0

I can't use SUM() OVER(ORDER BY MONTH ROWS UNBOUNDED PRECEDING) because even though the achievements in May, June and July were missed, the achieved column only looks forward, not backwards. I think the best way to achieve this would be to have an Overpayment column which is like a rolling sum of the previous overpayment values - the previously achieved value BUT if the overpayment was previously 0 then we have a month where there is an overpayment the previously rolling sum needs to be discarded and the overpayment value reset to the payment-target.

Simply put, I need overpayments to trickle to down the achieved columns over the months taking into consideration the payments made until overpayments are finished, ignoring any months where the target wasn't achieved.

Anonymous
  • 440
  • 3
  • 14
  • Hi - I may be missing something but the two tables you've shown appear to be identical? Also, please don't use images in your questions - always use editable text, then someone can copy your data if they want to try out a solution – NickW Jul 05 '23 at 09:46
  • apologies I fixed the images and added the results in editable text – Anonymous Jul 05 '23 at 09:56
  • Hi - I'm not sure what you mean by "fixed the images" as they are still images and not editable text. Also, please could your format the data you have provided as editable text as a table so that it is readable, the data and the headers line up, etc? Thanks – NickW Jul 05 '23 at 11:29
  • I've replaced both images with editable tables. – Anonymous Jul 05 '23 at 12:49

1 Answers1

1

Interesting challenge! Probably one of the hardest SQL questions I've faced recently - so hopefully someone doesn't come along with a simpler/obvious solution :)

This appears to give the correct result based on the sample data provided. I've split it up into multiple CTEs to hopefully make the logic clearer; some of the CTEs could probably be combined:

-- Creates an identifier each time an amount is collected 
-- and groups all subsequent records until the next amount is collected
WITH data1
     AS (SELECT *,
                SUM(CASE
                      WHEN amount_collected > 0 THEN 1
                      ELSE 0
                    END)
                  over (
                    PARTITION BY contract_id
                    ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
                  preceding
                  AND
                  CURRENT ROW) AS TRACKER
         FROM   trickle)
-- Provides a cumulative sum of the target/collected/balance amounts 
-- within each 'tracker' group. LAG(TRACKER) is needed for a subsequent step
,
     data2
     AS (SELECT *,
                SUM(target)
                  over (
                    PARTITION BY contract_id, tracker
                    ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
                  preceding
                  AND
                  CURRENT ROW)                        TARGET_RUNNING,
                SUM(amount_collected)
                  over (
                    PARTITION BY contract_id, tracker
                    ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
                  preceding
                  AND
                  CURRENT ROW)                        AMT_COLL_RUNNING,
                SUM(amount_collected - target)
                  over (
                    PARTITION BY contract_id, tracker
                    ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
                  preceding
                  AND
                  CURRENT ROW)                        BAL_RUNNING,
                Lag(tracker)
                  over (
                    PARTITION BY contract_id
                    ORDER BY collection_due_date ASC) LAG_TRACKER
         FROM   data1)
-- BAL_RUNNING is calculated using a windows function so lag_run_bal needs to be 
-- in another CTE as nested windows functions aren't allowed 
,
     data3
     AS (SELECT *,
                Lag(bal_running)
                  over (
                    PARTITION BY contract_id
                    ORDER BY collection_due_date ASC) LAG_RUN_BAL
         FROM   data2)
-- Calculates the available balance
,
     data4
     AS (SELECT *,
                CASE
                  WHEN tracker <> lag_tracker
                       AND lag_run_bal > 0 THEN amt_coll_running + lag_run_bal
                  ELSE amt_coll_running
                END AMT_COLL_RUNNING1
         FROM   data3)
SELECT contract_id,
       collection_due_date,
       target,
       amount_collected,
       CASE
         WHEN target_running <= amt_coll_running1 THEN target
         ELSE 0
       END                                             ACHIEVED,
       Greatest(amt_coll_running1 - target_running, 0) OVERPAYMENT
FROM   data4
ORDER  BY contract_id,
          collection_due_date ASC; 

Updated Solution

I think there may be a typo in your output table:

Contract_ID Starting_Month Contract_Duration_In_Months Collection_Due_Date Target Amount_Collected Achieved Overpayment
10001 01/01/2022 12 01/09/2022 10000 2500 10000 12500
10001 01/01/2022 12 01/10/2022 10000 0 2500 0
10001 01/01/2022 12 01/11/2022 10000 0 0 0

should be:

Contract_ID Starting_Month Contract_Duration_In_Months Collection_Due_Date Target Amount_Collected Achieved Overpayment
10001 01/01/2022 12 01/09/2022 10000 2500 10000 12500
10001 01/01/2022 12 01/10/2022 10000 0 10000 2500
10001 01/01/2022 12 01/11/2022 10000 0 2500 0

Assuming that is a typo and my correction is correct then this SQL should give you what you want:

WITH DATA0 AS (
  SELECT 
    *, 
    NVL(
      LAG(AMOUNT_COLLECTED) OVER (
        PARTITION BY CONTRACT_ID 
        ORDER BY 
          COLLECTION_DUE_DATE ASC
      ), 
      0
    ) LAG_AMT_COLL 
  FROM 
    TRICKLE
), 
DATA1 AS (
  SELECT 
    *, 
    SUM(
      CASE WHEN amount_collected > 0 
      AND LAG_AMT_COLL = 0 THEN 1 ELSE 0 END
    ) over (
      PARTITION BY contract_id 
      ORDER BY 
        collection_due_date ASC ROWS BETWEEN unbounded preceding 
        AND CURRENT ROW
    ) AS TRACKER 
  FROM 
    DATA0
), 
DATA2 AS (
  SELECT 
    *, 
    SUM(TARGET) OVER (
      PARTITION BY CONTRACT_ID, 
      TRACKER 
      ORDER BY 
        COLLECTION_DUE_DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING 
        AND CURRENT ROW
    ) TARGET_RUNNING, 
    SUM(AMOUNT_COLLECTED) OVER (
      PARTITION BY CONTRACT_ID, 
      TRACKER 
      ORDER BY 
        COLLECTION_DUE_DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING 
        AND CURRENT ROW
    ) AMT_COLL_RUNNING, 
    SUM(AMOUNT_COLLECTED - TARGET) OVER (
      PARTITION BY CONTRACT_ID, 
      TRACKER 
      ORDER BY 
        COLLECTION_DUE_DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING 
        AND CURRENT ROW
    ) BAL_RUNNING, 
    Lag(tracker) over (
      PARTITION BY contract_id 
      ORDER BY 
        collection_due_date ASC
    ) LAG_TRACKER 
  FROM 
    DATA1
), 
DATA3 AS (
  SELECT 
    *, 
    GREATEST(BAL_RUNNING, 0) BAL_RUNNING1 
  FROM 
    DATA2
) 
SELECT 
  CONTRACT_ID, 
  STARTING_MONTH, 
  CONTRACT_DURATION_IN_MONTHS, 
  COLLECTION_DUE_DATE, 
  TARGET, 
  AMOUNT_COLLECTED, 
  LEAST(
    TARGET, 
    NVL(
      LAG(BAL_RUNNING1) over (
        PARTITION BY contract_id 
        ORDER BY 
          collection_due_date ASC
      ), 
      0
    ) + AMOUNT_COLLECTED
  ) ACHIEVED, 
  Greatest(
    amt_coll_running - target_running, 
    0
  ) OVERPAYMENT 
FROM 
  DATA3 
ORDER BY 
  CONTRACT_ID, 
  COLLECTION_DUE_DATE
NickW
  • 8,430
  • 2
  • 6
  • 19