I am currently struggling with converting my data into a useful dataset. I need to evenly distribute payments from the first month up to the last month. The problem is that payments are inconsistent and unequal. Also, there are payments that have been fully paid and should be distributed from the first payment plus the term it is applicable based on the agreement dataframe.
My tables are the following:
1st table: payments
cust_id | agreement_id | date | payment |
---|---|---|---|
1 | A | 12/1/20 | 200 |
1 | A | 2/2/21 | 200 |
1 | A | 2/3/21 | 100 |
1 | A | 5/1/21 | 200 |
1 | B | 1/2/21 | 50 |
1 | B | 1/9/21 | 20 |
1 | B | 3/1/21 | 80 |
1 | B | 4/23/21 | 90 |
2 | C | 1/21/21 | 600 |
3 | D | 3/4/21 | 150 |
3 | D | 5/3/21 | 150 |
Here's the code for the payments dataframe:
payments = pd.DataFrame.from_dict({'cust_id': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 2, 9: 3, 10: 3},
'agreement_id': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'B', 5: 'B', 6: 'B', 7: 'B',
8: 'C', 9: 'D', 10: 'D'},
'date': {0: '12/1/20', 1: '2/2/21', 2: '2/3/21', 3: '5/1/21', 4: '1/2/21', 5: '1/9/21',
6: '3/1/21', 7: '4/23/21', 8: '1/21/21', 9: '3/4/21', 10: '5/3/21'},
'payment': {0: 200, 1: 200, 2: 100, 3: 200, 4: 50, 5: 20, 6: 80, 7: 90, 8: 600, 9: 150, 10: 150}})
2nd table: agreement
agreement_id | activation | term_months | total_fee |
---|---|---|---|
A | 12/1/20 | 24 | 4800 |
B | 1/21/21 | 6 | 600 |
C | 1/21/21 | 6 | 600 |
D | 3/4/21 | 6 | 300 |
Here's the code for the agreement dataframe:
agreement = pd.DataFrame.from_dict({'agreement_id': {0: 'A', 1: 'B', 2: 'C', 3: 'D'}, 'activation': {0: '12/1/20', 1: '1/2/21', 2: '1/21/21', 3: '3/4/21'}, 'term_months': {0: 24, 1: 6, 2: 6, 3: 6}, 'total_fee': {0: 4800, 1: 300, 2: 600, 3: 300}})
The results I wanted is as follows:
cust_id | agreement_id | date | payment |
---|---|---|---|
1 | A | 12/1/20 | 116.67 |
1 | A | 1/1/21 | 116.67 |
1 | A | 2/1/21 | 116.67 |
1 | A | 3/1/21 | 116.67 |
1 | A | 4/1/21 | 116.67 |
1 | A | 5/1/21 | 116.67 |
1 | B | 1/1/21 | 60 |
1 | B | 2/1/21 | 60 |
1 | B | 3/1/21 | 60 |
1 | B | 4/1/21 | 60 |
2 | C | 1/1/21 | 100 |
2 | C | 2/1/21 | 100 |
2 | C | 3/1/21 | 100 |
2 | C | 4/1/21 | 100 |
2 | C | 5/1/21 | 100 |
2 | C | 6/1/21 | 100 |
3 | D | 3/1/21 | 50 |
3 | D | 4/1/21 | 50 |
3 | D | 5/1/21 | 50 |
3 | D | 6/1/21 | 50 |
3 | D | 7/1/21 | 50 |
3 | D | 8/1/21 | 50 |
Or, in code form:
cust_id agreement_id date payment
0 1 A 12/1/20 116.67
1 1 A 1/1/21 116.67
2 1 A 2/1/21 116.67
3 1 A 3/1/21 116.67
4 1 A 4/1/21 116.67
5 1 A 5/1/21 116.67
6 1 B 1/1/21 60.00
7 1 B 2/1/21 60.00
8 1 B 3/1/21 60.00
9 1 B 4/1/21 60.00
10 2 C 1/1/21 100.00
11 2 C 2/1/21 100.00
12 2 C 3/1/21 100.00
13 2 C 4/1/21 100.00
14 2 C 5/1/21 100.00
15 2 C 6/1/21 100.00
16 3 D 3/1/21 50.00
17 3 D 4/1/21 50.00
18 3 D 5/1/21 50.00
19 3 D 6/1/21 50.00
20 3 D 7/1/21 50.00
21 3 D 8/1/21 50.00
Activation is the same with the date of first payment.
I tried using the following code (suggested by AlexK) that creates another column but it is only applicable when the total payments are less than the total fee. But, when the total payments are equal to the total fee, I need to distribute the payments accordingly from the start of the payment until the end of the month (start plus terms in months).
payments['date'] = pd.to_datetime(payments['date'])
resampled_payments = (payments
.set_index('date')
.groupby(['cust_id', 'agreement_id'])
.resample('MS')
.agg({'payment': sum})
.reset_index()
)
resampled_payments['avg_monthly_payment'] = (resampled_payments
.groupby(['cust_id', 'agreement_id'])['payment']
.transform('mean')
)