2

This is a question about the function nchoosek in Matlab.

I want to find nchoosek(54,25), which is the same as 54C25. Since the answer is about 10^15, I originally use int64. However the answer is wrong with respect to the symbolic one.

Input:

nchoosek(int64(54),int64(25))
nchoosek(sym(54),sym(25))

Output:

1683191473897753
1683191473897752

You can see that they differ by one. This is not really an urgent problem since I now use sym. However can someone tell me why this happens?


EDIT:

I am using R2013a.

I take a look at the nchoosek.m, and find that if the input are in int64, the code can be simplified into

function c = nchoosek2(v,k)

    n = v;  % rename v to be n. the algorithm is more readable this way.

    classOut = 'int64';
    nd = double(n);
    kd = double(k);
    nums = (nd-kd+1):nd;
    dens = 1:kd;
    nums = nums./dens;      %%
    c = round(prod(nums));
    c = cast(c,classOut);
end

However, the outcome of int64(prod(nums./dens)) is different from prod(sym(nums)./sym(dens)) for me. Is this the same for everyone?

Amro
  • 123,847
  • 25
  • 243
  • 454
k99731
  • 127
  • 5
  • Do you see warning that the result might not be exact in first case? – Ilya Oct 14 '14 at 04:18
  • no. no warning is shown. – k99731 Oct 14 '14 at 04:20
  • In the documentation it is said, that "the result is only accurate to 15 digits for double-precision inputs, or 8 digits for single-precision inputs". But also it is said, that MATLAB displays a warning that the result might not be exact if output is too big. So, if there are no warnings, I am not sure that it is your case. – Ilya Oct 14 '14 at 04:46
  • I am afraid that I cannot reproduce this. The code example by you works completely fine for me. The 2 different calculations are accurate up to 18 digits. This is however expected since the largest int64 can be 2^63-1. MATLAB 2014a. – patrik Oct 14 '14 at 06:44
  • 1
    I am using R2013a, maybe this is the reason of the problem – k99731 Oct 14 '14 at 08:06
  • 1
    Seems version dependent, I can reproduce this issuse on 2013a – RTL Oct 14 '14 at 08:34

2 Answers2

2

I don't have this problem on R2014a:

Numeric

>> n = int64(54);
>> k = int64(25);
>> nchoosek(n,k)
ans =
     1683191473897752    % class(ans) == int64

Symbolic

>> nn = sym(n);
>> kk = sym(k);
>> nchoosek(nn,kk)
ans =
1683191473897752         % class(ans) == sym

% N!/((N-K)! K!)
>> factorial(nn) / (factorial(nn-kk) * factorial(kk))
ans =
1683191473897752         % class(ans) == sym

If you check the source code of the function edit nchoosek.m, you'll see it specifically handles the case of 64-bit integers using a separate algorithm. I won't reproduce the code here, but here are the highlights:

function c = nchoosek(v,k)
    ...

    if int64type
        % For 64-bit integers, use an algorithm that avoids
        % converting to doubles
        c = binCoef(n,k,classOut);
    else
        % Do the computation in doubles.
        ...
    end

    ....
end

function c = binCoef(n,k,classOut)
    % For integers, compute N!/((N-K)! K!) using prime factor cancellations
    ...
end
Amro
  • 123,847
  • 25
  • 243
  • 454
  • Do you get the same result from int64(prod(nums./dens)) and prod(sym(nums)./sym(dens)) as stated in the edit? – k99731 Oct 14 '14 at 09:03
  • @k99731: As RTL explained, I think something has changed in the `nchoosek.m` function between your version and the latest release. – Amro Oct 14 '14 at 15:45
1

In 2013a this can be reproduced...

There is as @Amro shows a special case in nchoosek for classOut of int64 or unit64,
however in 2013a this is only applied when the answer is between

  • flintmax (with no argument) and
  • double(intmax(classOut)) + 2*eps(double(intmax(classOut)))

which for int64 gives 9007199254740992 & 9223372036854775808, which the solution does not lie between...


If the solution had fallen between these values it would be recalculated using the subfunction binCoef for which the help states: For integers, compute N!/((N-K)! M!) using prime factor cancellations

The binCoef function would have produced the right answer for the given int64 inputs

In 2013a with these inputs binCoef is not called

Instead the "default" pascals triangle method is used in which:

  • Inputs are cast to double
  • The product of the vector ((n-k+1):n)./(1:k) is taken
  • this vector contains k double representations of fractions.

So what we have is almost certainly floating point error.


What can be done?

Two options I can see;

  1. Make your own function based on the code in binCoef,
  2. Modify nchoosek and remove && c >= flintmax from line 81

Removing this expression will force Matlab to use the more accurate integer based calculation for inputs of int64 and uint64 for any values within their precision. This will be slightly slower but will avoid floating point errors, which are rightfully unexpected when working with integer types.

Option one - should be fairly straight forward...

Option two - I recommend keeping an unchanged backup of the original function, or makeing a copy of the function with the modification and use that instead.

RTL
  • 3,577
  • 15
  • 24
  • I don't have R2013a at hand to check, but I can see a remark in the [release notes](http://www.mathworks.com/help/matlab/release-notes.html#btnmv_x-1) of R2013a stating: `Integer type support for [...], and number theory functions. The following functions now support inputs of any integer data type: [...] and nchoosek.` I just compared `nchoosek.m` between R2014a and R2013b and the two files are identical. So the change must have occurred between R2013b and R2013a.. – Amro Oct 14 '14 at 15:39
  • 1
    That fills the gap... I assume then that integer support was *added* in 2013a and the problem found in this question was *fixed* for 2013b. I have access to 2013a and 2014a and this is the only obvious difference between the versions of nchoosek, specifically lines 77-80 in 2014a are not present in 2013a, instead (after casting to double and using pascals triangle) if the solution is in the range specified above it recalculates using binCoef (which is unchanged), this is handled on lines 80-85 in 2013a, which appears inbetween the equivalent lines 93 and 94 in 2014a (as elseif... ). – RTL Oct 14 '14 at 15:58