I have a pandas DataFrame, I'm trying to (in pandas or DuckDB SQL) do the following on each iteration partitioned by CODE
, DAY
, and TIME
:
- Iterate on each row to calculate the sum total of the 2 previous
TRANSACTIONS
orTRANSACTIONS_FORECAST
values (the first non NULL value e.g.COALESCE(TRANSACTIONS, TRANSACTIONS_FORECAST
) - Calculate the sum total of the
TRANSACTIONS_WEEK
values. - Calculate the
TRANSACTION_FORECAST
value ofstep1 / step2 * TRANSACTIONS_WEEK
- Stop iteration as soon as
TRANSACTION_FORECAST
column values are populated.
Here's the DataFrame:
import pandas as pd
from io import StringIO
csv_string="""'CODE,DAY,TIME,WEEK,TRANSACTIONS,TRANSACTIONS_WEEK,SOURCE\nA,Monday,1,20,23,263154,actual\nA,Monday,1,21,16,246649,actual\nA,Monday,1,23,,244086.6208,forecast\nA,Monday,1,24,,243197.7547,forecast\nA,Monday,1,25,,235561.9992,forecast\nA,Monday,1,26,,231105.5393,forecast\nA,Monday,1,27,,232744.1484,forecast\nA,Monday,1,28,,238718.1522,forecast\nA,Monday,1,29,,234870.8116,forecast\nA,Monday,1,30,,230410.6348,forecast\nA,Monday,1,31,,229832.8125,forecast\nA,Monday,1,32,,227024.5631,forecast\nA,Monday,1,33,,226483.0862,forecast\nA,Monday,1,34,,229247.3648,forecast\nA,Monday,1,35,,221272.5875,forecast\nA,Monday,1,36,,250239.7494,forecast\nA,Monday,1,37,,263229.4532,forecast\nA,Monday,1,38,,252955.314,forecast\nA,Monday,1,39,,241695.9493,forecast\nA,Monday,1,40,,247447.6128,forecast\nA,Monday,1,41,,247364.4851,forecast\nA,Monday,1,42,,244082.4747,forecast\nA,Monday,1,43,,229432.3064,forecast\nA,Monday,1,44,,222934.6285,forecast\nA,Monday,1,45,,224727.4305,forecast\nA,Monday,1,46,,225616.1613,forecast\nA,Monday,1,47,,225950.7391,forecast\nA,Monday,1,48,,225553.2239,forecast\nA,Monday,1,49,,225523.3712,forecast\nA,Monday,1,50,,215116.1205,forecast\nA,Monday,1,51,,239592.5374,forecast\nA,Monday,1,52,,228592.4596,forecast\nB,Monday,1,20,29,263154,orders_base\nB,Monday,1,21,27,246649,orders_base\nB,Monday,1,23,,244086.6208,forecast\nB,Monday,1,24,,243197.7547,forecast\nB,Monday,1,25,,235561.9992,forecast\nB,Monday,1,26,,231105.5393,forecast\nB,Monday,1,27,,232744.1484,forecast\nB,Monday,1,28,,238718.1522,forecast\nB,Monday,1,29,,234870.8116,forecast\nB,Monday,1,30,,230410.6348,forecast\nB,Monday,1,31,,229832.8125,forecast\nB,Monday,1,32,,227024.5631,forecast\nB,Monday,1,33,,226483.0862,forecast\nB,Monday,1,34,,229247.3648,forecast\nB,Monday,1,35,,221272.5875,forecast\nB,Monday,1,36,,250239.7494,forecast\nB,Monday,1,37,,263229.4532,forecast\nB,Monday,1,38,,252955.314,forecast\nB,Monday,1,39,,241695.9493,forecast\nB,Monday,1,40,,247447.6128,forecast\nB,Monday,1,41,,247364.4851,forecast\nB,Monday,1,42,,244082.4747,forecast\nB,Monday,1,43,,229432.3064,forecast\nB,Monday,1,44,,222934.6285,forecast\nB,Monday,1,45,,224727.4305,forecast\nB,Monday,1,46,,225616.1613,forecast\nB,Monday,1,47,,225950.7391,forecast\nB,Monday,1,48,,225553.2239,forecast\nB,Monday,1,49,,225523.3712,forecast\nB,Monday,1,50,,215116.1205,forecast\nB,Monday,1,51,,239592.5374,forecast\nB,Monday,1,52,,228592.4596,forecast\nC,Saturday,2,19,173,259156,orders_base\nC,Saturday,2,20,179,263154,orders_base\nC,Saturday,2,21,185,246649,orders_base\nC,Saturday,2,22,162,225220,orders_base\nC,Saturday,2,23,,244086.6208,forecast\nC,Saturday,2,24,,243197.7547,forecast\nC,Saturday,2,25,,235561.9992,forecast\nC,Saturday,2,26,,231105.5393,forecast\nC,Saturday,2,27,,232744.1484,forecast\nC,Saturday,2,28,,238718.1522,forecast\nC,Saturday,2,29,,234870.8116,forecast\nC,Saturday,2,30,,230410.6348,forecast\nC,Saturday,2,31,,229832.8125,forecast\nC,Saturday,2,32,,227024.5631,forecast\nC,Saturday,2,33,,226483.0862,forecast\nC,Saturday,2,34,,229247.3648,forecast\nC,Saturday,2,35,,221272.5875,forecast\nC,Saturday,2,36,,250239.7494,forecast\nC,Saturday,2,37,,263229.4532,forecast\nC,Saturday,2,38,,252955.314,forecast\nC,Saturday,2,39,,241695.9493,forecast\nC,Saturday,2,40,,247447.6128,forecast\nC,Saturday,2,41,,247364.4851,forecast\nC,Saturday,2,42,,244082.4747,forecast\nC,Saturday,2,43,,229432.3064,forecast\nC,Saturday,2,44,,222934.6285,forecast\nC,Saturday,2,45,,224727.4305,forecast\nC,Saturday,2,46,,225616.1613,forecast\nC,Saturday,2,47,,225950.7391,forecast\nC,Saturday,2,48,,225553.2239,forecast\nC,Saturday,2,49,,225523.3712,forecast\nC,Saturday,2,50,,215116.1205,forecast\nC,Saturday,2,51,,239592.5374,forecast\nC,Saturday,2,52,,228592.4596,forecast'"""
df = pd.read_csv(StringIO(csv_string))
Here's the expected result:
csv_string_result="""'CODE,DAY,TIME,WEEK,TRANSACTIONS,TRANSACTIONS_WEEK,SOURCE,TRANSACTIONS_ROLLING_2_SUM,TRANSACTIONS_WEEK_ROLLING_2_SUM,TRANSACTIONS_FORECAST\nA,Monday,1,20,23,263154,actual,,,\nA,Monday,1,21,16,246649,actual,,,\nA,Monday,1,23,,244086.6208,forecast,39,509803,18.67266025\nA,Monday,1,24,,243197.7547,forecast,34.67266025,490735.6208,17.18300601\nA,Monday,1,25,,235561.9992,forecast,35.85566626,487284.3756,17.3332716\nA,Monday,1,26,,231105.5393,forecast,34.51627761,478759.7539,16.66159882\nA,Monday,1,27,,232744.1484,forecast,,,\nA,Monday,1,28,,238718.1522,forecast,,,\nA,Monday,1,29,,234870.8116,forecast,,,\nA,Monday,1,30,,230410.6348,forecast,,,\nA,Monday,1,31,,229832.8125,forecast,,,\nA,Monday,1,32,,227024.5631,forecast,,,\nA,Monday,1,33,,226483.0862,forecast,,,\nA,Monday,1,34,,229247.3648,forecast,,,\nA,Monday,1,35,,221272.5875,forecast,,,\nA,Monday,1,36,,250239.7494,forecast,,,\nA,Monday,1,37,,263229.4532,forecast,,,\nA,Monday,1,38,,252955.314,forecast,,,\nA,Monday,1,39,,241695.9493,forecast,,,\nA,Monday,1,40,,247447.6128,forecast,,,\nA,Monday,1,41,,247364.4851,forecast,,,\nA,Monday,1,42,,244082.4747,forecast,,,\nA,Monday,1,43,,229432.3064,forecast,,,\nA,Monday,1,44,,222934.6285,forecast,,,\nA,Monday,1,45,,224727.4305,forecast,,,\nA,Monday,1,46,,225616.1613,forecast,,,\nA,Monday,1,47,,225950.7391,forecast,,,\nA,Monday,1,48,,225553.2239,forecast,,,\nA,Monday,1,49,,225523.3712,forecast,,,\nA,Monday,1,50,,215116.1205,forecast,,,\nA,Monday,1,51,,239592.5374,forecast,,,\nA,Monday,1,52,,228592.4596,forecast,,,\nB,Monday,1,20,29,263154,orders_base,,,\nB,Monday,1,21,27,246649,orders_base,,,\nB,Monday,1,23,,244086.6208,forecast,56,509803,26.81202497\nB,Monday,1,24,,243197.7547,forecast,53.81202497,490735.6208,26.66805322\nB,Monday,1,25,,235561.9992,forecast,53.48007819,487284.3756,25.85322815\nB,Monday,1,26,,231105.5393,forecast,52.52128136,478759.7539,25.35292274\nB,Monday,1,27,,232744.1484,forecast,,,\nB,Monday,1,28,,238718.1522,forecast,,,\nB,Monday,1,29,,234870.8116,forecast,,,\nB,Monday,1,30,,230410.6348,forecast,,,\nB,Monday,1,31,,229832.8125,forecast,,,\nB,Monday,1,32,,227024.5631,forecast,,,\nB,Monday,1,33,,226483.0862,forecast,,,\nB,Monday,1,34,,229247.3648,forecast,,,\nB,Monday,1,35,,221272.5875,forecast,,,\nB,Monday,1,36,,250239.7494,forecast,,,\nB,Monday,1,37,,263229.4532,forecast,,,\nB,Monday,1,38,,252955.314,forecast,,,\nB,Monday,1,39,,241695.9493,forecast,,,\nB,Monday,1,40,,247447.6128,forecast,,,\nB,Monday,1,41,,247364.4851,forecast,,,\nB,Monday,1,42,,244082.4747,forecast,,,\nB,Monday,1,43,,229432.3064,forecast,,,\nB,Monday,1,44,,222934.6285,forecast,,,\nB,Monday,1,45,,224727.4305,forecast,,,\nB,Monday,1,46,,225616.1613,forecast,,,\nB,Monday,1,47,,225950.7391,forecast,,,\nB,Monday,1,48,,225553.2239,forecast,,,\nB,Monday,1,49,,225523.3712,forecast,,,\nB,Monday,1,50,,215116.1205,forecast,,,\nB,Monday,1,51,,239592.5374,forecast,,,\nB,Monday,1,52,,228592.4596,forecast,,,\nC,Saturday,2,19,173,259156,orders_base,,,\nC,Saturday,2,20,179,263154,orders_base,,,\nC,Saturday,2,21,185,246649,orders_base,,,\nC,Saturday,2,22,162,225220,orders_base,,,\nC,Saturday,2,23,,244086.6208,forecast,347,471869,179.4948544\nC,Saturday,2,24,,243197.7547,forecast,341.4948544,469306.6208,176.9648629\nC,Saturday,2,25,,235561.9992,forecast,356.4597173,487284.3756,172.319015\nC,Saturday,2,26,,231105.5393,forecast,349.283878,478759.7539,168.6053147\nC,Saturday,2,27,,232744.1484,forecast,,,\nC,Saturday,2,28,,238718.1522,forecast,,,\nC,Saturday,2,29,,234870.8116,forecast,,,\nC,Saturday,2,30,,230410.6348,forecast,,,\nC,Saturday,2,31,,229832.8125,forecast,,,\nC,Saturday,2,32,,227024.5631,forecast,,,\nC,Saturday,2,33,,226483.0862,forecast,,,\nC,Saturday,2,34,,229247.3648,forecast,,,\nC,Saturday,2,35,,221272.5875,forecast,,,\nC,Saturday,2,36,,250239.7494,forecast,,,\nC,Saturday,2,37,,263229.4532,forecast,,,\nC,Saturday,2,38,,252955.314,forecast,,,\nC,Saturday,2,39,,241695.9493,forecast,,,\nC,Saturday,2,40,,247447.6128,forecast,,,\nC,Saturday,2,41,,247364.4851,forecast,,,\nC,Saturday,2,42,,244082.4747,forecast,,,\nC,Saturday,2,43,,229432.3064,forecast,,,\nC,Saturday,2,44,,222934.6285,forecast,,,\nC,Saturday,2,45,,224727.4305,forecast,,,\nC,Saturday,2,46,,225616.1613,forecast,,,\nC,Saturday,2,47,,225950.7391,forecast,,,\nC,Saturday,2,48,,225553.2239,forecast,,,\nC,Saturday,2,49,,225523.3712,forecast,,,\nC,Saturday,2,50,,215116.1205,forecast,,,\nC,Saturday,2,51,,239592.5374,forecast,,,\nC,Saturday,2,52,,228592.4596,forecast,,,'"""
df_result = pd.read_csv(StringIO(csv_string_result))
Here's the DuckDB recursive CTE I attempted:
-- The problem with this is that terminal would kill the process after 10 mins or so
WITH RECURSIVE ROLLING_SUM AS (
SELECT CODE
, DAY
, TIME
, WEEK
, TRANSACTIONS
, TRANSACTIONS_WEEK
, CASE
WHEN TRANSACTIONS IS NULL
THEN SUM(TRANSACTIONS) OVER (
PARTITION BY CODE, DAY, TIME
ORDER BY WEEK
ROWS BETWEEN 2 PRECEDING AND 1 PRECEDING
)
END AS ROLLING_2_SUM_TRANSACTIONS
, CASE -- noticed don't actually need a CASE statement for this
WHEN TRANSACTIONS IS NULL
THEN SUM(TRANSACTIONS_WEEK) OVER (
PARTITION BY CODE, DAY, TIME
ORDER BY WEEK
ROWS BETWEEN 2 PRECEDING AND 1 PRECEDING
)
END AS ROLLING_2_SUM_TRANSACTIONS_WEEK
, ROLLING_2_SUM_TRANSACTIONS / ROLLING_2_SUM_TRANSACTIONS_WEEK AS ROLLING_2_TRANSACTIONS_PCT
, ROLLING_2_TRANSACTIONS_PCT * TRANSACTIONS_WEEK AS TRANSACTIONS_FORECAST
, SOURCE
FROM df
UNION ALL
SELECT CODE
, DAY
, TIME
, WEEK
, TRANSACTIONS
, TRANSACTIONS_WEEK
, CASE
WHEN COALESCE(TRANSACTIONS, TRANSACTIONS_FORECAST) IS NULL
THEN SUM(
COALESCE(TRANSACTIONS, TRANSACTIONS_FORECAST)
) OVER (
PARTITION BY CODE, DAY, TIME
ORDER BY WEEK
ROWS BETWEEN 2 PRECEDING AND 1 PRECEDING
)
END
, CASE
WHEN COALESCE(TRANSACTIONS, TRANSACTIONS_FORECAST) IS NULL
THEN SUM(TRANSACTIONS_WEEK) OVER (
PARTITION BY CODE, DAY, TIME
ORDER BY WEEK
ROWS BETWEEN 2 PRECEDING AND 1 PRECEDING
)
END
, ROLLING_2_SUM_TRANSACTIONS / ROLLING_2_SUM_TRANSACTIONS_WEEK AS ROLLING_2_TRANSACTIONS_PCT
, ROLLING_2_TRANSACTIONS_PCT * TRANSACTIONS_WEEK AS TRANSACTIONS_FORECAST
, SOURCE
FROM ROLLING_SUM
WHERE WEEK <= 52
)
SELECT *
FROM ROLLING_SUM
WHERE COALESCE(TRANSACTIONS, TRANSACTIONS_FORECAST) IS NOT NULL
Tried in pure pandas but it was quickly getting out of hand. Looked into itertuples
but not sure how to get a rolling sum doing that.
Help on this would be greatly appreciated.