-1

I want to reduce the number of characters used to represent a very big number.

For example. Take this number 4593638492...3837292 (1000+ characters long)

What techniques or tricks can I use to represent this number using fewer characters (say around 50 characters or even less)

Thanks in advance :)

dickson
  • 11
  • 3
  • 2
    Perhaps by using a higher **numerical base** than base 10, such as hexadecimal (base 16). This reduces the number of digits since each digit's place value in base 16 is greater than it would be in base 10 (so you need a smaller number of digits to represent the same number)). – Telescope Jul 15 '22 at 23:48
  • 4
    Representing a 1000-decimal-digit number in 50 characters would require an alphabet of 100,000,000,000,000,000,000 different characters, which is stretching the bounds of reality, since no-one could learn how to read such an alphabet. What is it that you are really trying to accomplish? – rici Jul 16 '22 at 00:12
  • 1
    @rici: 10^20 nearly fits in a 64 bits integer (1000 digits would take 52 such integers). This is a quite acceptable numeration base. I would not be surprised that it is effectively in use in some BigNum implementation. –  Jul 16 '22 at 08:21
  • @yves: indeed, GMP uses 64-bit limbs on many platforms. But that doesn't save (much) memory. (Python's native bignums use 15- or 30-bit limbs, which makes them about 6% bigger.) But I don't believe that's what OP is talking about. They use the word "character", which in normal discourse would not be applied to an alphabet of 2^64 possibilities, and there is no evidence that they meant anything else. "Representing the number using fewer characters" is certainly not the same as "storing the number with a different limb size". – rici Jul 16 '22 at 15:46
  • How are these big numbers obtained, how important is is to exactly represent the number precisely vs approximately, and what is the use case? If this is for human readability, you can give the number a name and define it once. "Dickson's number" or whatever. – Dave Jul 16 '22 at 19:57
  • @rici: read my answer. –  Jul 17 '22 at 14:18

2 Answers2

3

You can't. (Unless your numbers were created from a simple rule, such as, say, the factorial of an integer.)

Characters are represented on 8 bits (one byte) and a number of 1000 digits holds 3322 bits of information. At best you can pack as 416 characters. (Or 208 characters of the wide type, but that brings no compression.)


Think of this: with three-digits numbers, you can represent 1000 distinct numbers (from 000 to 999). This does not allow you to encode arbitrary four-digits numbers, as there are 10000 of these. Unless, for some reason, there are less than 1000 four-digits numbers that need to be encoded, and that subset has a computable definition (e.g. only the odd multiples of 7 or only primes).

1

First, determine if there's a simple mathematical expression to give the number. Maybe a factorization. For example, it takes less space to write 2^64*3^40 than 224269343257001716702690972139746492416.

Otherwise, as Telescope suggested in their comment, use a higher numerical base. In general, writing the integer n in base b takes approximately (log n)/(log b) characters.

Base 36 is a convenient choice, with case-insensitive alphanumeric numerals, which use 64.25% of the space of the base-10 equivalent.

If you want to stick to ASCII digits, try Base 94, using the printable ASCII characters from 33 to 126. Base 94 numerals are 50.68% of the length of the base-ten equivalent.

If you can use Unicode, maybe pick a block of 1000 Kanji characters to use as your "digits". That takes only 1/3 of the space of base-ten. Or if you can find 10,000 usable "digits", you're down to only 1/4 of the space.

dan04
  • 87,747
  • 23
  • 163
  • 198
  • 2
    Your estimates are wrong. Larger bases require less digits, but the digits take more space. Whatever the representation, you need at least 3322 bits, and bases that are not a power of 2 will take more. –  Jul 16 '22 at 08:25
  • @YvesDaoust: True *if* you have an optimal binary representation for the digits. However, if you're constrained to an existing character encoding like ASCII, everything from base-2 (`0` and `1`) to base-94 (all printable characters as digits) will consume 8 bits per digit regardless. – dan04 Jul 18 '22 at 16:40
  • You are contradicting yourself, your own example is with Kanji. –  Jul 18 '22 at 16:48