Given an integer z<=10^100, find the smallest row of Pascal's triangle that contains z

Question

How can I find an algorithm to solve this problem using C++: given an integer z<=10^100, find the smallest row of Pascal's triangle that contains the number z.

For example if z=6 => result is on the 4th row.

Another way to describe the problem: given integer z<=10^100, find the smallest integer n: exist integer k so that C(k,n) = z.

C(k,n) is combination of n things taken k at a time without repetition

Yes I am sure :(. I have testdata of this problem z=56476362530291763837811509925185051642180136064700011445902684545741089307844616509330834616 => res=5864079763474581 — quangh, Apr 14 '14 at 10:50
I think that's rather a math problem than a programming one - by your example, any algorithm would have to perform in order of 5864079763474581 operations best-case. Which is quite long time. Better off asking on http://math.stackexchange.com/ — DarkWanderer, Apr 14 '14 at 10:58
do you consider only valid entries, coefficients of triangle? are you looking for exact match? — 4pie0, Apr 14 '14 at 10:58
@DarkWanderer I believe it is a computer programming problem and that the user is seeking an algorithm to solve it. Also your conclusion is not right. There may be an algorithm that is faster than linear to solve this problem. For instance to find the smallest number x such that phi(x) = K where k is a constant one performs way less then K operations. — Ivaylo Strandjev, Apr 14 '14 at 11:13
@Ivaylo Strandjev: feel free to share a *purely algorithmical solution, not relying on mathematical properties of PT* like relation to binomial coefficients etc. I agree that in general case it's not necessarily O(n), but for PT, as I envision, there would be at least root path check. — DarkWanderer, Apr 14 '14 at 11:44
@privatedatapublicchannel2 any number N will appear in the PT at the row N+1, so there's no "invalid" entries. — DarkWanderer, Apr 14 '14 at 11:47
@DarkWanderer Does "purely algorithmical solution" really make sense? I feel like most of smart algorithms rely on (more or less advanced) math properties. — user189, Apr 14 '14 at 11:53
@user189 Yes :) "Find Fibonacci number N" is O(N) (optimal) with dynamic programming, without any knowledge except fibonacci number definition — DarkWanderer, Apr 14 '14 at 11:56
@DarkWanderer This is thanks to properties (the most simple one being "you can compute f(n) from f(n-1) and f(n-2) in O(1) if n is bounded…") you deduce from the definition (you can do the same from the binomial definition). Maybe it will need more complex properties, maybe not. — user189, Apr 14 '14 at 12:00
`you can compute f(n) from f(n-1) and f(n-2)` is not a property, it's part of definition itself - and that's all that needed for the algorithm ;) — DarkWanderer, Apr 14 '14 at 12:06
How would the input be represented? The bound on the input seems quite large compared to native integral types. — Codor, Apr 14 '14 at 13:16
@Codor: there are C++ libraries like [this](https://mattmccutchen.net/bigint/) one — DarkWanderer, Apr 14 '14 at 13:59

pentadecagon · Accepted Answer · 2014-04-14T16:56:18.293

4

EDIT This solution needs Logarithmic time, it's O(Log z). Or maybe O( (Log z)^2 ).

Say you are looking for n,k where Binomial(n,k)==z for a given z.

Each row has its largest value in the middle, so starting from n=0 you increase the row number, n, as long as the middle value is smaller than the given number. Actually, 10^100 isn't that big, so before row 340 you find a position n0,k0=n0/2 where the value from the triangle is larger than or equal to z: Binomial(n0,k0)>=z
You walk to the left, i.e. you decrease the column number k, until eventually you find a value smaller than z. If there was a matching value in that row you would have hit it by now. k is not very large, less than 170, so this step won't be executed more often than that and does not present a performance problem.
From here you walk down, increasing n. Here you will find a steadily increasing value of Binomial[n,k]. Continue with 3 until the value gets bigger than or equal to z, then goto 2.

EDIT: This step 3 can loop for a very long time when the row number n is large, so instead of checking each n linearly you can do a binary search for n with Binomial(n,k) >= z > Binomial(n-1,k), then it only needs Log(n) time.

A Python implementation looks like this, C++ is similar but somewhat more cumbersome because you need to use an additional library for arbitrary precision integers:

# Calculate (n-k+1)* ... *n
def getnk( n, k ):
    a = n
    for u in range( n-k+1, n ):
        a = a * u
    return a

# Find n such that Binomial(n,k) >= z and Binomial(n-1,k) < z
def find_n( z, k, n0 ):
    kfactorial = k
    for u in range(2, k):
        kfactorial *= u

    xk = z * kfactorial            

    nk0 = getnk( n0, k )
    n1=n0*2
    nk1 = getnk( n1, k )

    # duplicate n while the value is too small
    while nk1 < xk:
        nk0=nk1
        n0=n1
        n1*=2
        nk1 = getnk( n1, k )
    # do a binary search
    while n1 > n0 + 1:
        n2 = (n0+n1) // 2
        nk2 = getnk( n2, k )
        if nk2 < xk:
            n0 = n2
            nk0 = nk2
        else:
            n1 = n2
            nk1 = nk2

    return n1, nk1 // kfactorial


def find_pos( z ):
    n=0
    k=0
    nk=1

    # start by finding a row where the middle value is bigger than z
    while nk < z:
        # increase n
        n = n + 1
        nk = nk * n // (n-k)
        if nk >= z:
            break
        # increase both n and k
        n = n + 1
        k = k + 1
        nk = nk * n // k

    # check all subsequent rows for a matching value
    while nk != z:
        if nk > z:
            # decrease k
            k = k - 1
            nk = nk * (k+1) // (n-k)
        else:
            # increase n
            # either linearly
            #  n = n + 1
            #  nk = nk * n // (n-k)
            # or using binary search:
            n, nk = find_n( z, k, n )
    return n, k

z = 56476362530291763837811509925185051642180136064700011445902684545741089307844616509330834616
print( find_pos(z) )

It should print

(5864079763474581, 6)

edited Apr 14 '14 at 16:56

answered Apr 14 '14 at 12:42

pentadecagon

4,717
2
18
26

what is an answer here? which row is it, is this 3113? I think this is not correct – 4pie0 Apr 14 '14 at 13:26
Yes, it would be 3113, because x==Binomial(3113, 47). There is no smaller row containing that number. – pentadecagon Apr 14 '14 at 13:56
definitely there is smaller row – 4pie0 Apr 14 '14 at 13:57
Which one? Keep in mind that we are looking for an exact match. – pentadecagon Apr 14 '14 at 13:59
@privatedatapublicchannel2 Again: Which one? For which `n` and `k` you get `C(n,k)==x`? – pentadecagon Apr 14 '14 at 14:07
1

Score one. I have a bad feeling, though - this algorithm would take *really* long time for the testing data in question (n = 5864079763474581). Also, it's linear @user189 ;) – DarkWanderer Apr 14 '14 at 14:08
@DarkWanderer Have you any idea how it can be solved ? The only thing I understand is that answer should also be in BigInt type, and I don't really know how to calculate it in non-linear time. – Ralor Apr 14 '14 at 14:32
@pentadecagon: thank you, but your algorithm take very long time to run this test z=93863470374298566037291090778292905378244184695975285947107426922806381327246206906989514 the ans= 93863470374298566037291090778292905378244184695975285947107426922806381327246206906989514 anyway, thank you :( – quangh Apr 14 '14 at 14:56
1

@user3531460 Fixed by doing binary search. Now it should be quick for any input. – pentadecagon Apr 14 '14 at 14:59
@DarkWanderer Now I do a binary search, and as a result the algorithm has Logarithmic complexity. Unfortunately the code got *much* longer and less readable. – pentadecagon Apr 14 '14 at 15:02
my algorithm will skip 337 rows maybe improving what you've desribed in step 1, this approach should be taken to get starting position from where to search – 4pie0 Apr 14 '14 at 15:14
@privatedatapublicchannel2 Your algorithm doesn't skip any rows, it checks all rows linearly. One certainly could optimize this part, but in general the first 337 rows only present a small fraction of the total runtime, so any further optimizations at this point are hardly worthwhile. – pentadecagon Apr 14 '14 at 15:19
@pentadecagon yes, it skips evaluating the coefficients up to result that it returns – 4pie0 Apr 14 '14 at 15:21
@pentadecagon Hat off, wish I could upvote twice. Good job. – DarkWanderer Apr 14 '14 at 18:10

4pie0 · Answer 2 · 2014-04-14T17:54:39.453

Stirling estimation for n! can be used to find first row in triangle with binomial coefficient bigger or equal to a given x. Using this estimation we can derive lower and upper bound for

enter image description here

and then by observation that this is the maximum coefficient in row that expands 2n:

P( 2n, 0), P( 2n, 1), P( 2n, 2), ..., P( 2n, 2n -1), P( 2n, 2n)

we can find first row with maximum binomial coefficient bigger or equal to a given x. This is the first row in which x can be looking for, this is not possible that x can be found in the row smaller than this. Note: this may be right hint and give an answer immediately in some cases. At the moment I cannot see other way than to start a brute force search from this row.

template <class T>
T binomial_coefficient(unsigned long n, unsigned long k) {
    unsigned long i;
    T b;
    if (0 == k || n == k) {
        return 1;
    }
    if (k > n) {
        return 0;
    }
    if (k > (n - k)) {
        k = n - k;
    }
    if (1 == k) {
        return n;
    }
    b = 1;
    for (i = 1; i <= k; ++i) {
        b *= (n - (k - i));
        if (b < 0) return -1; /* Overflow */
        b /= i;
    }
    return b;
}

Stirling:

double stirling_lower_bound( int n) {
    double n_ = n / 2.0;
    double res = pow( 2.0, 2 * n_);
    res /= sqrt( n_ * M_PI);
    return res * exp( ( -1.0) / ( 6 * n_));
}

double stirling_upper_bound( int n) {
    double n_ = n / 2.0;
    double res = pow( 2.0, 2 * n_) ;
    res /= sqrt( n_ * M_PI);
    return res * exp( 1.0 / ( 24 * n_));
}

int stirling_estimate( double x) {
    int n = 1;
    while ( stirling_lower_bound( n) <= x) {
        if ( stirling_upper_bound( n) > x) return n;
        ++n;
    }
    return n;
}

usage:

long int search_coefficient( unsigned long int &n, unsigned long int x) {
    unsigned long int k = n / 2;
    long long middle_coefficient = binomial_coefficient<long long>( n, k);
    if( middle_coefficient == x) return k;

    unsigned long int right = binomial_coefficient<unsigned long>( n, ++k);
    while ( x != right) {

        while( x < right ||  x < ( right * ( n + 1) / ( k + 1))) {
            right = right * ( n + 1) / ( ++k) - right;
        }
        if ( right == x) return k;
        right = right * ( ++n) / ( ++k);
        if( right > x) return -1;
    }
    return k;
}


/*
 * 
 */
int main(int argc, char** argv) {

    long long x2 = 1365;
    unsigned long int n = stirling_estimate( x2);
    long int k = search_coefficient( n, x2);
    std::cout << "row:" << n <<", column: " << k; 
    return 0;
}

output:

row:15, column: 11

Again, incorrect results for number 8: the estimation is 5, but the value is actually at row 9. Doesn't help us at all. — DarkWanderer, Apr 14 '14 at 14:02
@DarkWanderer this returns correctly 5 but this is not what you think it should return — 4pie0, Apr 14 '14 at 14:08
If you know *what* it should return, share; don't play "secret knowledge" game. It's not constructive. — DarkWanderer, Apr 14 '14 at 14:10
@DarkWanderer please read this answer again, "first row with maximum binomial coefficient bigger or equal to a given x." — 4pie0, Apr 14 '14 at 14:12
What's the big O of this? I don't think it'd be very good, let alone data types won't work for OP's scales. — DarkWanderer, Apr 14 '14 at 18:09
well, faster that accepted solution as Stirling estimation reduces initial steps to find start row — 4pie0, Apr 14 '14 at 20:23

Given an integer z<=10^100, find the smallest row of Pascal's triangle that contains z

2 Answers2