Concatenation of binary representation of first n positive integers in O(logn) time complexity

Question

I came across this question in a coding competition. Given a number n, concatenate the binary representation of first n positive integers and return the decimal value of the resultant number formed. Since the answer can be large return answer modulo 10^9+7. N can be as large as 10^9. Eg:- n=4. Number formed=11011100(1=1,10=2,11=3,100=4). Decimal value of 11011100=220.

I found a stack overflow answer to this question but the problem is that it only contains a O(n) solution. Link:- concatenate binary of first N integers and return decimal value

Since n can be up to 10^9 we need to come up with solution that is better than O(n).

@Arafat Siddiqui I missed `answer modulo 10^9+7` requirement — MBo, Oct 21 '20 at 05:42
Does this answer your question? [concatenate binary of first N integers and return decimal value](https://stackoverflow.com/questions/62631840/concatenate-binary-of-first-n-integers-and-return-decimal-value) — duplex143, Apr 25 '21 at 11:02

Mark Dickinson · Accepted Answer · 2020-10-21T20:37:07.883

Here's some Python code that provides a fast solution; it uses the same ideas as in Abhinav Mathur's post. It requires Python >= 3.8, but it doesn't use anything particularly fancy from Python, and could easily be translated into another language. You'd need to write algorithms for modular exponentiation and modular inverse if they're not already available in the target language.

First, for testing purposes, let's define the slow and obvious version:

# Modulus that results are reduced by,
M = 10 ** 9 + 7


def slow_binary_concat(n):
    """
    Concatenate binary representations of 1 through n (inclusive).

    Reinterpret the resulting binary string as an integer.
    """
    concatenation = "".join(format(k, "b") for k in range(n + 1))
    return int(concatenation, 2) % M

Checking that we get the expected result:

>>> slow_binary_concat(4)
220
>>> slow_binary_concat(10)
462911642

Now we'll write a faster version. First, we split the range [1, n) into subintervals such that within each subinterval, all numbers have the same length in binary. For example, the range [1, 10) would be split into four subintervals: [1, 2), [2, 4), [4, 8) and [8, 10). Here's a function to do that splitting:

def split_by_bit_length(n):
    """
    Split the numbers in [1, n) by bit-length.

    Produces triples (a, b, 2**k). Each triple represents a subinterval
    [a, b) of [1, n), with a < b, all of whose elements has bit-length k.
    """
    a = 1
    while n > a:
        b = 2 * a
        yield (a, min(n, b), b)
        a = b

Example output:

>>> list(split_by_bit_length(10))
[(1, 2, 2), (2, 4, 4), (4, 8, 8), (8, 10, 16)]

Now for each subinterval, the value of the concatenation of all numbers in that subinterval is represented by a fairly simple mathematical sum, which can be computed in exact form. Here's a function to compute that sum modulo M:

def subinterval_concat(a, b, l):
    """
    Concatenation of values in [a, b), all of which have the same bit-length k.
    l is 2**k.

    Equivalently, sum(i * l**(b - 1 - i)) for i in range(a, b)) modulo M.
    """
    n = b - a
    inv = pow(l - 1, -1, M)
    q = (pow(l, n, M) - 1) * inv
    return (a * q + (q - n) * inv) % M

I won't go into the evaluation of the sum here: it's a bit off-topic for this site, and it's hard to express without a good way to render formulas. If you want the details, that's a topic for https://math.stackexchange.com, or a page of fairly simple algebra.

Finally, we want to put all the intervals together. Here's a function to do that.

def fast_binary_concat(n):
    """
    Fast version of slow_binary_concat.
    """
    acc = 0
    for a, b, l in split_by_bit_length(n + 1):
        acc = (acc * pow(l, b - a, M) + subinterval_concat(a, b, l)) % M
    return acc

A comparison with the slow version shows that we get the same results:

>>> fast_binary_concat(4)
220
>>> fast_binary_concat(10)
462911642

But the fast version can easily be evaluated for much larger inputs, where using the slow version would be infeasible:

>>> fast_binary_concat(10**9)
827129560
>>> fast_binary_concat(10**18)
945204784

I hadn't added the code so that the reader still has to put in some effort :) — Abhinav Mathur, Oct 22 '20 at 04:58
This is awesome! I've added a C++ solution and I've explained the math a bit more clearer. Have a look here : https://stackoverflow.com/a/67252483/5524175 — duplex143, Apr 25 '21 at 10:57

Abhinav Mathur · Answer 2 · 2020-10-21T08:54:26.030

You just have to note a simple pattern. Taking up your example for n=4, let's gradually build the solution starting from n=1.

1 -> 1                         #1
2 -> 2^2(1) + 2                #6
3 -> 2^2[2^2(1)+2] + 3         #27
4 -> 2^3{2^2[2^2(1)+2]+3} + 4  #220

If you expand the coefficients of each term for n=4, you'll get the coefficients as:

1 -> (2^3)*(2^2)*(2^2)
2 -> (2^3)*(2^2)
3 -> (2^3)
4 -> (2^0)

Let the N be total number of bits in the string representation of our required number, and D(x) be the number of bits in x. The coefficients can then be written as

1 -> 2^(N-D(1))
2 -> 2^(N-D(1)-D(2))
3 -> 2^(N-D(1)-D(2)-D(3))
... and so on

Since the value of D(x) will be the same for all x between range (2^t, 2^(t+1)-1) for some given t, you can break the problem into such ranges and solve for each range using mathematics (not iteration). Since the number of such ranges will be log2(Given N), this should work in the given time limit.
As an example, the various ranges become:

1. 1 (D(x) = 1)
2. 2-3 (D(x) = 2)
3. 4-7 (D(x) = 3)
4. 8-15 (D(x) = 4)

Can that sum be evaluated faster than just summing up every term? — harold, Oct 21 '20 at 08:02
The terms form arithmetico–geometric sequence and the sum can be calculated in logarithmic time. https://stackoverflow.com/a/67252483/5524175 — duplex143, Apr 25 '21 at 11:05
@abhinav-mathur downvote doesn't mean that the answer is wrong. https://meta.stackexchange.com/questions/2451/why-do-you-cast-downvotes-on-answers https://meta.stackexchange.com/questions/39161/guide-for-upvoting-and-downvoting — duplex143, Apr 25 '21 at 13:50

Concatenation of binary representation of first n positive integers in O(logn) time complexity

2 Answers2