5

I found the following code for computing nCr, but don't understand the logic behind it. Why does this code work?

long long combi(int n,int k)
{
    long long ans=1;
    k=k>n-k?n-k:k;
    int j=1;
    for(;j<=k;j++,n--)
    {
        if(n%j==0)
        {
            ans*=n/j;
        }else
        if(ans%j==0)
        {
            ans=ans/j*n;
        }else
        {
            ans=(ans*n)/j;
        }
    }
    return ans;
}
Justin R.
  • 23,435
  • 23
  • 108
  • 157
ankitbisla21
  • 63
  • 1
  • 5
  • What specific questions do you have about the logic? It seems like a mouthful for us unless you give us some pointers about what aspect of the logic you're looking for. – La-comadreja Jun 18 '14 at 20:26
  • i am not getting on what thinking,the code inside the for loop is written?how is this for loop helpful in computing nCr for some given n and r? – ankitbisla21 Jun 18 '14 at 20:44
  • 1
    @La-comadreja I don't think ~15 lines of code are too much. The logic itself is very compact. – Niklas B. Jun 19 '14 at 13:20
  • 4
    If you replace the loop body with `ans = ans/j*n + ans%j*n/j;`, the code overflows only if the result does too. The code in the question overflows already for 62C30=450883717216034179 (it returns -164007751907617541). You can also get a little more precision if you make all variables unsigned. – Řrřola Dec 03 '15 at 13:19

4 Answers4

9

that's a clever code!

In general it aims to calculate the following formula:

ans = n! / (k!)(n-k)!

It is equal to:

ans = n(n-1)(n-2) ... (n-k)...1 / k(k-1)...1 * (n-k)(n-k-1) ... 1

And after obvious cancellation:

ans = n(n-1)(n-2)..(n-k+1) / k!

Now notice that nominator and denominator have the same number of elements (k element)

So the calculation of ans will be like the following:

ans  = 1 // initially
ans *= n/1
ans *= (n-1)/2
ans *= (n-2)/3
.
.
.
ans *=  (n-k+1)/k

take a look again at the code and you notice that:

  1. ans is being multiplied by n at each iteration
  2. n is reduced by 1 at each iteration (n--)
  3. ans is divided by j at each iteration

This is exactly what is done by the posted code, Now let's see the meanings of different conditions in the loop, with nominator starting from n and denominator from 1 to k, so variable j is assigned to denominator right?

1) if(n%j==0)

at each step if n/j is (computable) So we calculate it first here than multiply to the whole ans, this practice keeps the result at its smallest possible value.

2) else if(ans%j==0)

at each step if we couldn't calculate n/j but actually can calculate ans/j so that's not bad to say :

ans /= j; //first we divide
ans *= n; //then we multiply

This is always keeping our overall output as small as possible, right?

3) last condition

at each step, if we couldn't compute neither n/j nor ans/j in this case we are not lucky enough to divide first then multiply (hence keeping the result small). But well we need to carry on even-though we are left with only one choice which is

ans *= n; // multiply first
ans /= j; // then divide

ET VOILA!

Example consider the case 3C7 we know that the answer is 7!/ 3!*4! hence : ans = 7*6*5 / 1*2*3

let's see what happen at each iteration:

//1 
ans = 1

//2 
n = 7
j = 1
ans = ans * n/j 
first compute 7/1 = 7
then multiply to ans
ans = 1*7
ans = 7

//3
n = 6
j = 2
ans = ans* n/j

evaluate n/j = 6/2 (can be divided)
         n/j = 3
ans = ans *(n/j)
    = 7 * 3
    = 21

// 4
n = 5
j = 3

ans = ans * n/j
evaluate n/j = 5/3 oppsss!! (first if)
evaluate ans/j = 21/3 = 7 YES (second if)

ans = (ans/j)*n
    = 7*5
    = 35

// end iterations

Note that in last iteration if we calculate straight forward we would say:

ans = ans*n/j
    = 21 * 5 / 3
    = 105 / 3
    = 34 

yes it does find right result but meanwhile the value flies up to 105 before getting back to 35. Now imagine calculating real large numbers?!

Conclusion This code is computing carefully the binomial coefficients trying to keep the output as small as possible at each step of calculation, it does that by checking if it is possible to divide (int) then execute, hence it is capable of calculating some very big kCn that the straightforward coding cannot handle (OverFlow may occur)

chouaib
  • 2,763
  • 5
  • 20
  • 35
  • I think the code in the else clause is not necessary. I'm pretty sure either ans or n are divisible by j. – shebang Jun 20 '14 at 06:53
  • @shebang try `8C37` , exactly at iteration `j=4` – chouaib Jun 20 '14 at 07:28
  • 8C37 doesn't make sense :O – shebang Jun 20 '14 at 07:41
  • Ok if you say so. Try `combi(37, 8)` use a pencil and paper, you will see something when `j=4` @shebang – chouaib Jun 20 '14 at 07:44
  • Oh sorry, turns out I was wrong. Also about 8C37 not making sense, I'm used to the convention of nCr and not rCn. Another thing, as numbers become large, composite numbers become abundant. Perhaps the only piece of code required is the one in the else part since large numbers are more likely to be covered by that clause. Maybe a calculation involving GCDs might make it more large number friendly. – shebang Jun 20 '14 at 07:54
  • @shebang involving GCDs is a cool idea as well. I am only afraid of the difficulties to implement it and the raise of computation time. Still I like it & I'll try to implement it :) – chouaib Jun 20 '14 at 08:12
  • Thank You chouaib!This is what exactly i wanted to know :) – ankitbisla21 Jun 20 '14 at 16:01
  • 1
    @user3754037 do you mean that this is the accepted answer :) – chouaib Jun 21 '14 at 18:05
4

To answer the question in part, consider the fact that the entries of n choose k constitute Pascal's triangle. As Pascal's triangle is symmetric, it is sufficient to move the argument k into the left half, which is done with the

k=k>n-k?n-k:k;

statement; see the definition of C's conditional operator.

Furthermore, the result ans is initialized in the beginning to contain 1, which is the first entry of every row in Pascal's triangle, which means that initially, ans is in fact n choose j.

Codor
  • 17,447
  • 9
  • 29
  • 56
  • Yes,i know,we only need to move k in one half,but i am not getting on what thinking,the code inside the for loop is written,i mean how would one think for the three if else statements? – ankitbisla21 Jun 18 '14 at 20:32
  • Please see the answer of ayusha below. In each iteration, the algorithm multiplies `ans` with `n/j`, and three cases are distinguished. The first two are relatively easy to see - for the last one, one must remark that `ans` will be integral after the multiplication. In fact, the code would also work with only the last case, but `ans` would overflow more quickly. – Codor Jun 18 '14 at 21:08
1

The fact is that nCr for 1<=k<=n/2 is same as in n/2+1<=k<=n.so first change in k so that it values lies value in the left half.One more thing nCk means (n*(n-1).....(n-k))/(k*(k-1)*....*2*1) so the above code apply it iteratively.

ayusha
  • 474
  • 4
  • 12
  • Nice answer; I can't figure it out completely; do you mean that the closed formula for `n choose k` is evaluated iteratively, and on-the-fly reductions of the fraction are used to diminish the rapid growth of values caused by the factorials? To put it another way - the factors of the fraction are suitably grouped such that only integral factors remain? – Codor Jun 18 '14 at 20:32
  • yes,you said correct, that's why we check if condition in the loop. – ayusha Jun 18 '14 at 20:39
  • I sort of understand the ideas, however I am not totally convinced; could you provide a more detailed explanation? I sort of see that by writing out the factorials of the closed formula, there are exactly `n` factors in the denominator and `n` in the numerator, where respectively the last `n-k` factors cancel out right away. – Codor Jun 18 '14 at 20:43
-2

yes. [N choose K] reduces its factorials a lot because the dividend and divisor share many factors that cancel each other out to x/x=1 (for x>0) the trick is to not calculate the large factorials, because these large factors require too much address space (too many bits)

the first trick is to reduce the fraction, before dividing. the second trick is to do modulo within a conditional to chose one of 3 operations for the current iteration. this can be done differently, and integer modulo is chosen to be a fast operator, skipping some slower integer division approaches.

you iteratively traverse pascals triangle. with each path that you take, you multiply something.

There are 3 possible branching paths for every iterative step: each of the 3 steps multiplies the accumulator "ans" with a different value, representing the factor between 2 "positions" on pascals triangle. you always end up doing N multiplications, where N is the number of iterations, and end up at the binomial coefficient's value.

N is the column # of pascals triangle that you want to know, and you accumulate an N, multiplied by something, while reducing the number of column s (and lines) of pascals triangle by N=N-1 for each iteration.

j=1;

ans=0;

//within each iteration;

ans=ans*n;

n=n-1;

ans=ans/j;

j=n+1;

the integer division is slow and can be skipped (or made faster by making the divisor smaller) at least once, and often many more times (because there are a lot of shared prime factors in pascals triangle), this is being done by the modulo conditionals.

pascals triangle is extremely symmetric (on summing up its domains), therefore this works.

the difference between (partial) sums of columns of pascals triangle shows the symmetry that is important for the multiplications and divisions here.

just watch some youtube videos on the symmetries and identities of pascals triangle.

Brad Larson
  • 170,088
  • 45
  • 397
  • 571
ollj
  • 1