-2

I am solving a problem where I am given three integers (a,b,c), all three can be very large and (a>b>c)

I want to identify for which base between b and c, produces the smallest sum of digits, when we convert 'a' to that base.

For example a = 216, b=2, c=7 -> the output= 6, because: 216 base 2 = 11011000, and the sum of digits = 4, if we do the same for all bases between 2 and 7, we find that 216 base 6 produces the smallest sum of digits, because 216 base 6 = 1000, which has sum 1.

My question is, is there any function out there that can convert a number to any base in constant time faster than the below algorithm? Or any suggestions on how to optimise my algorithm?

from collections import defaultdict
n = int(input())
for _ in range(n):
    (N,X) = map(int,input().split())
    array = list(map(int,input().split()))
    my_dict = defaultdict(int)

    #original count of elements in array
    for i in range(len(array)):
        my_dict[array[i]] +=1

    #ensure array contains distinct elements
    array = set(array)
    count = max(my_dict.values())  #count= max of single value
    temp = count
    res = None
    XOR_count = float("inf")
    if X==0:
        print(count,0)
        break
    for j in array:
        if j^X in my_dict:
            curr = my_dict[j^X] + my_dict[j]
            if curr>=count:
                count = curr
                XOR_count = min(my_dict[j],XOR_count)
    if count ==temp:
        XOR_count = 0
    print(f"{count} {XOR_count}")

Here are some sample input and outputs:

Sample Input  
3
3 2
1 2 3
5 100
1 2 3 4 5
4 1
2 2 6 6

Sample Output  
2 1
1 0
2 0

Which for the problem I am solving runs into time limit exceeded error.

I found this link to be quite useful (https://www.purplemath.com/modules/logrules5.htm) in terms of converting log bases, which I can kind of see how it relates, but I couldn't use it to get a solution for my above problem.

  • 1
    I don't think it's possible. Converting between bases requires a loop to generate all the digits in the result, so it will depend on the size of the input. – Barmar Sep 09 '21 at 16:39
  • 2
    Python supports infinite precision integers, so clearly there is no constant time algorithm: it has to be proportional to the size of the integer at the very least – Mad Physicist Sep 09 '21 at 16:39
  • Your input/output is O(n) space (n~digits). The algorithm *must* be O(n) as well unless it ignores digits. – MisterMiyagi Sep 09 '21 at 16:40
  • Thanks for the input guys. How about a faster algorithm than the above? I will amend the question – Patrick_Chong Sep 09 '21 at 16:41
  • If "all three can be very large", there won't be a (significantly) faster algorithm and no *builtin* fast/compiled function that already does it for you. – MisterMiyagi Sep 09 '21 at 16:48
  • Okay, thanks Mister. The only thing is that I am getting a Time Out Error for my solution- however the above is not the entire code. Let me share the entire code. It would be very helpful to understand why this is timing out. I was told it is to do with the 'while' loop in the above function, but if you say there is no more optimisation there perhaps the issue lies somewhere else. – Patrick_Chong Sep 09 '21 at 16:50
  • Just one idea, I hope not fully stupid: why not using tools dedicated to big amount of data manipulation, and use a database? I have no clue of which will be the most powerful, but the idea is to replace your arrays and dictionaries with tables and, if necessary, dedicated tools (like pandas or spark). One objective: do not run out of time and being able at lease to provide with a response. – Christophe Sep 09 '21 at 17:06
  • No, no databases allowed. Simply code – Patrick_Chong Sep 09 '21 at 17:14
  • There is defo an optimisation required somewhere is all I know – Patrick_Chong Sep 09 '21 at 17:14
  • Where is this from, i.e., where can we test this ourselves? – no comment Sep 09 '21 at 18:01
  • @Uplus263A Hmm... for their example `b=2, c=7` that's not even true. – no comment Sep 09 '21 at 18:05
  • Um... I don't see what your code and sample data has to do with your question at all. Seem to be for an entirely different problem. – no comment Sep 09 '21 at 19:14

1 Answers1

1

You could separate the problem in smaller concerns by writing a function that returns the sum of digits in a given base and another one that returns a number expressed in a given base (base 2 to 36 in my example below):

def digitSum(N,b=10):
    return N if N<b else N%b+digitSum(N//b,b)

digits = "0123456789abcdefghijklmnopqrstuvwxyz"
def asBase(N,b):
    return "" if N==0 else asBase(N//b,b)+digits[N%b]

def lowestBase(N,a,b):
    return asBase(N, min(range(a,b+1),key=lambda c:digitSum(N,c)) )

output:

print(lowestBase(216,2,7))
1000     # base 6

print(lowestBase(216,2,5))
11011000 # base 2

Note that both digitSum and asBase could be written as iterative instead of recursive if you're manipulating numbers that are greater than base^1000 and don't want to deal with recursion depth limits

Here's a procedural version of digitSum (to avoid recursion limits):

def digitSum(N,b=10):
    result = 0
    while N:
        result += N%b
        N //=b
    return result

and returning only the base (not the encoded number):

def lowestBase(N,a,b):
    return min(range(a,b+1),key=lambda c:digitSum(N,c))

# in which case you don't need the asBase() function at all.

With those changes results for a range of bases from 2 to 1000 are returned in less than 60 milliseconds:

lowestBase(10**250+1,2,1000)  --> 10 in 57 ms

lowestBase(10**1000-1,2,1000) --> 3 in 47 ms

I don't know how large is "very large" but it is still sub-second for millions of bases (yet for a relatively smaller number):

lowestBase(10**10-1,2,1000000) --> 99999 in 0.47 second

lowestBase(10**25-7,2,1000000) --> 2 in 0.85 second

[EDIT] optimization

By providing a maximum sum to the digitSum() function, you can make it stop counting as soon as it goes beyond that maximum. This will allow the lowestBase() function to obtain potential improvements more efficiently based on its current best (minimal sum so far). Going through the bases backwards also gives a better chance of hitting small digit sums faster (thus leveraging the maxSum parameter of digitSum()):

def digitSum(N,b=10,maxSum=None):
    result = 0
    while N:
        result += N%b
        if maxSum and result>=maxSum:break
        N //= b
    return result

def lowestBase(N,a,b):
    minBase = a
    minSum  = digitSum(N,a)
    for base in range(b,a,-1):
        if N%base >= minSum: continue # last digit already too large
        baseSum = digitSum(N,base,minSum)
        if baseSum < minSum:
            minBase,minSum = base,baseSum
            if minSum == 1: break
    return minBase

This should yield a significant performance improvement in most cases.

Alain T.
  • 40,517
  • 4
  • 31
  • 51
  • Thanks Alain, would this optimise the run-time though? – Patrick_Chong Sep 10 '21 at 09:17
  • Also, is it possible to return the base rather than the number itself? So in the above case, returning 6 for the first example and 2 for the second? Thanks! – Patrick_Chong Sep 10 '21 at 09:23
  • it's actually simpler to return the base because, then you don't even need the `asBase` function, just `return min(range(a,b+1),key=lambda c:digitSum(N,c))`. I haven't done any comparisons but the larger the base, the faster `digitSum` will perform so, even with bases from 2 to 1000 and a number with 250 digits, `lowestBase` takes roughly 60 milliseconds. (beyond that you'll have to make digitSum iterative rather than recursive) – Alain T. Sep 10 '21 at 12:37
  • Thanks a lot Alain! I am still running into time limit exceeded error. All three of the variables can go as high as 10**9. Would this change anything? Because I've tried it all and still seem to be running into time limit exceeded error! – Patrick_Chong Sep 10 '21 at 13:30
  • Rather than iterating through all of the numbers l through r, is there anyway I can skip/discard numbers?- this is the only other thing I can think of that is causing the time limit to exceed – Patrick_Chong Sep 10 '21 at 13:31
  • Well, you could skip all powers of bases that have already been checked because the representation of N in base `b^k` will always have a sum of digits greater or equal to its representation in base `b^1`. but I don't think that'll make a meaningful difference – Alain T. Sep 10 '21 at 14:28
  • Could you give an example please? As the input size is as big as 10**9 for each variable any optimisation I think would make a difference! – Patrick_Chong Sep 10 '21 at 14:31
  • numbers that are powers of a smaller number in the 10**9 range represent 0.003% of the bases to check. the additional code to exclude them would likely increase the total time rather than reduce it (really not worth the effort). – Alain T. Sep 10 '21 at 14:44