Here's a pretty cool way to do it with MILP if you have OPTMODEL! This program:
- Adds each account to a bucket with a 1/0 binary flag. These are denoted by
b1
and b2
- Minimizes the absolute difference between balances
- Linearizes the absolute value function
- Constrains the difference between the two sizes to be within 3 observations
Data:
data have;
input account balance;
datalines;
9999 110
9998 111
9997 112
9996 113
9995 114
9994 115
9993 116
9992 117
9991 118
9990 119
;
run;
Optimization code:
proc optmodel;
set ACCOUNTS;
num balance{ACCOUNTS};
read data have into ACCOUNTS=[account] balance;
var b1{ACCOUNTS} binary,
b2{ACCOUNTS} binary,
z integer /* To linearize absolute value in the objective function */
;
/* n of buckets 1 and 2 */
impvar n1 = sum{a in ACCOUNTS} (1*b1[a]);
impvar n2 = sum{a in ACCOUNTS} (1*b2[a]);
/* Total balance of each bucket */
impvar total_balance1 = sum{a in ACCOUNTS} (b1[a]*balance[a]);
impvar total_balance2 = sum{a in ACCOUNTS} (b2[a]*balance[a]);
/* Difference of balances between buckets */
impvar dif = total_balance1 - total_balance2;
/* Linearize absolute value: abs(n1 - n2) must be <= 3 */
con n1 - n2 <= 3;
con -(n1-n2) <= 3;
/* Linearize absolute value in objective function */
con dif <= z;
con -dif <= z;
/* Must use all observations */
con n1 + n2 = sum{a in ACCOUNTS} (1);
/* Both cannot be 1 */
con use_once {a in ACCOUNTS}: (b1[a] + b2[a]) <= 1;
min total = z;
solve;
create data want
from [account] = ACCOUNTS
balance[account]
b1[account]
b2[account]
;
print balance b1 b2 n1 n2 dif total_balance1 total_balance2;
quit;
Then use a data step to separate them out:
data want1
want2
;
set want;
if(b1) then output want1;
else output want2;
drop b1 b2;
run;
Output Want1:
account balance
9998 111
9996 113
9994 115
9993 116
9991 118
Output Want2:
account balance
9999 110
9997 112
9995 114
9992 117
9990 119

Totally overkill, but a fun exercise.