I want to divide a table into two half such that the number of rows and sum of amount column is similar

Question

For example please find the below data:

enter image description here

account	balance
9999	110
9998	111
9997	112
9996	113
9995	114
9994	115
9993	116
9992	117
9991	118
9990	119

The output should be in such a way that there are 5 rows in Table_A and 5 Rows in Table_B and sum of balance column should almost be similar.

Want the output in SAS or PROC SQL.

I tried many ways in proc sql but not able to generate an output

Is the goal to have equal sums or equal N. In general both goals cannot be achieved at the same time. — Tom, Nov 25 '22 at 20:23

score 0 · Answer 1 · answered Nov 25 '22 at 19:36

Here is a base SAS solution.

You can sort by balance then output odd rows to one data set and even rows to the other. This should ensure that they have roughly equal number of observations, and the sum of balance should be as equal as possible.

* create the input data;
data have;
    input account $ balance;
    datalines;
9999 110
9998 111
9997 112
9996 113
9995 114
9994 115
9993 116
9992 117
9991 118
9990 119
;
run;

* sort by balance;
proc sort data=have;
    by balance;
run;

* output odd rows to split1, even rows to split2;
data split1 split2;
    set have;
    if mod(_n_,2) then output split1;
    else output split2;
run;

score 0 · Answer 2 · answered Nov 29 '22 at 19:07

Here's a pretty cool way to do it with MILP if you have OPTMODEL! This program:

Adds each account to a bucket with a 1/0 binary flag. These are denoted by b1 and b2
Minimizes the absolute difference between balances
Linearizes the absolute value function
Constrains the difference between the two sizes to be within 3 observations

Data:

data have;
    input account balance;
    datalines;
9999 110
9998 111
9997 112
9996 113
9995 114
9994 115
9993 116
9992 117
9991 118
9990 119
;
run;

Optimization code:

proc optmodel;
    set ACCOUNTS;

    num balance{ACCOUNTS};
    
    read data have into ACCOUNTS=[account] balance;

    var b1{ACCOUNTS} binary,
        b2{ACCOUNTS} binary,
        z integer /* To linearize absolute value in the objective function */
    ;

    /* n of buckets 1 and 2 */
    impvar n1 = sum{a in ACCOUNTS} (1*b1[a]);
    impvar n2 = sum{a in ACCOUNTS} (1*b2[a]);

    /* Total balance of each bucket */
    impvar total_balance1 = sum{a in ACCOUNTS} (b1[a]*balance[a]);
    impvar total_balance2 = sum{a in ACCOUNTS} (b2[a]*balance[a]);

    /* Difference of balances between buckets */
    impvar dif = total_balance1 - total_balance2;

    /* Linearize absolute value: abs(n1 - n2) must be <= 3 */
    con n1 - n2    <= 3;
    con -(n1-n2)   <= 3;

    /* Linearize absolute value in objective function */
    con dif  <= z;
    con -dif <= z;

    /* Must use all observations */
    con n1 + n2 = sum{a in ACCOUNTS} (1);

    /* Both cannot be 1 */
    con use_once {a in ACCOUNTS}: (b1[a] + b2[a]) <= 1;

    min total = z;

    solve;

    create data want
        from [account] = ACCOUNTS
             balance[account]
             b1[account]
             b2[account]
    ;

    print balance b1 b2 n1 n2 dif total_balance1 total_balance2;
quit;

Then use a data step to separate them out:

data want1
     want2
;
    set want;

    if(b1) then output want1;
        else output want2;

    drop b1 b2;
run;

Output Want1:

account balance
9998    111
9996    113
9994    115
9993    116
9991    118

Output Want2:

account balance
9999    110
9997    112
9995    114
9992    117
9990    119

Totally overkill, but a fun exercise.

I want to divide a table into two half such that the number of rows and sum of amount column is similar

2 Answers2