-2

I need to compress a random stream data like [25,94,182,3,254, ...]. The number of data are close to 4 million. I currently only get 1.4x ratio by Huffman code. The LZW algorithm I tried is take too much time to compress. I hope to find out an efficiency compression method and still have high compression rate, at least 3x. Is there another algorithm that would be able to compress this random data more better?

Vincent 炜森
  • 109
  • 1
  • 2
  • 7
  • 1
    (1) There are tons of benchmarks for compressors and they don't care if you got 8-bit or 32-bit values (as internally either working on bytes or bits) although some use-cases might want to need something tuned (especially with additional filters). (2) If your data is uniformly-random it can't be compressed (can be proven) and this shows that your question is lacking detail. (3) This question is borderline off-topic here (at least too broad). – sascha Sep 11 '17 at 14:57
  • If this comes from any kind of random number generator, use a different one. Really random streams can't be compressed. Seeing a 1.4x ratio shows that there is quite some regularity in the stream. – Ralf Kleberhoff Sep 11 '17 at 15:03
  • True random data (and crypto-secure) is hard to compress. If you have so high ratio, it means your random numbers are not so random – Marcin Szwarc Sep 11 '17 at 16:35
  • Your question has a contradiction: truly random numbers cannot be compressed at all (no matter of the algorithm). What does your stream really contain? – geza Sep 11 '17 at 17:51
  • `The LZW algorithm I tried [takes] too much time to compress` - after quoting a result using `Huffman code`(faster?!): what LZW implementation did you use? – greybeard Sep 12 '17 at 06:33

1 Answers1

2

It depends on the distribution of the rng. A compression ratio of 1:1.4 suggest that it's not uniform or not good. Huffman and arithmetic coding are practically the only options*, since there is no other correlation between successive entries of good RNG.

*To be precise, the best compression scheme has to be 0-order statistical compression that is able to allocate a variable number of bits for each symbol to reach the Shannon entropy

H(x) = -Sigma_{i=1}^{N} P(x_i) log_2 P(x_i)

The theoretical best is achieved by arithmetical coding, but other encodings can come close by chance. Arithmetic coding can allocate less than one bit per symbol, where as Huffman, or Golomb coding need at least one bit per symbol (or symbol group).

Aki Suihkonen
  • 19,144
  • 1
  • 36
  • 57