PostgreSQL 9.1
Business situation
Every month, there is a new batch of accounts given to a specific process. Every batch can be described by month, number of accounts and total balance of accounts. The goal of the process is to recover some of the balance back from customers. Each batch is than tracked separately on a monthly basis (amount recovered on each month since batch was transferred to the process).
Goal
My goal is to predict what amount will be recovered in the future.
Data definition
create table vintage_data (
granularity date, /* Month when account entered process*/
distance_in_months integer, /* Distance in months from date when accounts entered process*/
entry_accounts integer, /* Number of accounts that entered process in a given month*/
entry_amount numeric, /* Total amount for account that entered process in a given month*/
recovery_amount numeric /* Amount recovered in Nth month on accounts that entered process in a given month */
);
Sample data
insert into vintage_data values('2012-01-31',1,200,100000,1000);
insert into vintage_data values('2012-01-31',2,200,100000,2000);
insert into vintage_data values('2012-01-31',3,200,100000,3000);
insert into vintage_data values('2012-01-31',4,200,100000,3500);
insert into vintage_data values('2012-01-31',5,200,100000,3400);
insert into vintage_data values('2012-01-31',6,200,100000,3300);
insert into vintage_data values('2012-02-28',1,250,150000,1200);
insert into vintage_data values('2012-02-28',2,250,150000,1600);
insert into vintage_data values('2012-02-28',3,250,150000,1800);
insert into vintage_data values('2012-02-28',4,250,150000,1200);
insert into vintage_data values('2012-02-28',5,250,150000,1600);
insert into vintage_data values('2012-03-31',1,200,90000,1300);
insert into vintage_data values('2012-03-31',2,200,90000,1200);
insert into vintage_data values('2012-03-31',3,200,90000,1400);
insert into vintage_data values('2012-03-31',4,200,90000,1000);
insert into vintage_data values('2012-04-30',1,300,180000,1600);
insert into vintage_data values('2012-04-30',2,300,180000,1500);
insert into vintage_data values('2012-04-30',3,300,180000,4000);
insert into vintage_data values('2012-05-31',1,400,225000,2200);
insert into vintage_data values('2012-05-31',2,400,225000,6000);
insert into vintage_data values('2012-06-30',1,100,60000,1000);
Calculation Process
You can imagine the data as a triangular matrix (X values are to be forecasted):
distance_in_months 1 2 3 4 5 6
granularity entry_accounts entry_amount
2012-01-31 200 100000 1000 2000 3000 3500 3400 3300
2012-02-28 250 150000 1200 1600 1800 1200 1600 (X-1)
2012-03-31 200 90000 1300 1200 1400 1000 (X0) (X4)
2012-04-30 300 180000 1600 1500 4000 (X1) (X5) (X8)
2012-05-31 400 225000 2200 6000 (X2) (X6) (X9) (X11)
2012-06-30 100 60000 1000 (X3) (X7) (X10) (X12 (X13)
Algorithm
The goal I have is to forecast all the missing points (future). To illustrate the process, this is the calculation for the point X1
1) Get row totals for previous three months using distance up to 4:
2012-01-31 1000+2000+3000+3500=9500 (d4m3)
2012-02-28 1200+1600+1800+1200=5800 (d4m2)
2012-03-31 1300+1200+1400+1000=4900 (d4m1)
2) Get row totals for previous three months using distance up to 3:
2012-01-31 1000+2000+3000=6000 (d3m3)
2012-02-28 1200+1600+1800=4600 (d3m2)
2012-03-31 1300+1200+1400=3800 (d3m1)
3) Calculate weighted average running rate for distance 3 and distance 4 (weighted by entry_amount):
(d4m3+d4m2+d4m1)/(100000+150000+90000) = (9500+5800+4900)/(100000+150000+90000) = 20200/340000 = 0.0594
(d3m3+d3m2+d3m1)/(100000+150000+90000) = (6000+4600+3800)/(100000+150000+90000) = 14400/340000 = 0.0424
4) Calculate the change between distance 3 and distance 4
((d4m3+d4m2+d4m1)/(100000+150000+90000))/((d3m3+d3m2+d3m1)/(100000+150000+90000)) =
= (20200/340000)/(14400/340000) =
= 0.0594/0.0424 = 1.403 (PredictionRateForX1)
5) Calculate row totals for predicted month using distance up to 3:
2012-04-30 1600+1500+4000=7100
6) Calculate rate using entry_amount for predicted month
7100/180000 = 0.0394
7) Calculate rate predicted for X1
0.0394 * PredictionRateForX1 = 0.05534
8) Calculate amount for X1
(0.05534-0.0394)*180000 = 2869.2
Problem
The problem is how to calculate the rest of the matrix (from x-1 to x13) using SQL statement. It is obvious that this will require some sort of recursive algorithm.