3

I have read that lz4 algorithm is very fast and has pretty good compression. But in my test app compressed text is larger than the source text. What is the problem?

srand(time(NULL));
std::string text;
for (int i = 0; i < 65535; ++i)
    text.push_back((char)(0 + rand() % 256));

cout << "Text size: " << text.size() << endl;

char *compressedData = new char[text.size() * 2];
int compressedSize = LZ4_compress(text.c_str(), text.size(), compressedData);

cout << "Compressed size: " << compressedSize << endl;

I also tried LZ4_compress, but result is the same. But if I generate string with same symbols or say with two different symbols, then compression is present.

wallyk
  • 56,922
  • 16
  • 83
  • 148
user2123079
  • 656
  • 8
  • 29
  • 7
    What do you expect from compression of random data not having patterns (your question is actually the answer) ? –  Aug 05 '15 at 17:43
  • Text size: 65535 Compressed size: 65793 – user2123079 Aug 05 '15 at 17:43
  • 3
    noise (== random data) is not compressible. It's a core property of a random source. For your test to be valid, you should better load some real text into your buffer. – Cyan Aug 05 '15 at 22:18

1 Answers1

5

Have a look at a description of the LZ4 algorithm. It references common substrings within the compressed text. It uses the already output text as a dictionary.

Random text or any other material without repeating sequences of any length will not compress well using it. For that plaintext, a bit compression algorithm will probably do better.

wallyk
  • 56,922
  • 16
  • 83
  • 148