0

I have a bunch of ids in the form of uint32_t-uint32_ttwo unsigned integers separated by a hyphen. I'm looking to compress them. I was thinking to convert them to BCD (in other words two digits per byte) and output the compressed string. For example, 90-1418

90-1418 -> 0x39 0x00 0x2d 0x31 0x34 0x31 0x38
             \    /    |    \    /    \    /
              0x90    0x2d   0x14      0x18

Are there any libraries in C out there that can compress/decompress this type of strings?

Pete Darrow
  • 455
  • 5
  • 20

1 Answers1

1

You could store the number as ( ( num1 * factor ) + num2 ) * 16 + factor.

001011000101010110100111
\/\_________/\_____/\__/
 \          \      \   \___ The number of bits in first number (without leading zeros), padded to 4 bits.
  \          \      \______ The first number in big-endian byte order.
   \          \____________ The second number in big-endian byte order.
    \______________________ Padding so the whole takes a multiple of 8 bits.

3 bytes.

Max = 32+4+32+4 = 9 bytes.

This always produces a shorter result than yours.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • This is interesting. So basically I strip the hyphen and store the binary string in 3 bytes. Is this a specific algorithm? and do you know if there is a library implementing this? – Pete Darrow Dec 08 '16 at 21:13
  • You know how 345 = ( ( 3 * 10 ) + 4 ) * 10 + 5? Same idea, but the base is variable. – ikegami Dec 08 '16 at 21:14
  • This solution, like yours, assumes you know the length of the whole. (It would require another 4 bits to not need that.) – ikegami Dec 08 '16 at 21:15
  • Sorry, do you mean the length of the whole numerical string? i.e., `901418` – Pete Darrow Dec 08 '16 at 21:19