0

I have BitArray with 100M elements .This is about 12.5M . I have to compress this array . I use GZipStream of Framework .

   public static byte[] Compress(byte[] bytData)
    {
        try
        {
            MemoryStream ms = new MemoryStream();
            Stream s = new GZipStream(ms, CompressionMode.Compress);
            s.Write(bytData, 0, bytData.Length);
            s.Close();
            byte[] compressedData = ms.ToArray();
            return compressedData;
        }
        catch
        {
            return null;
        }
    }

    static void Main(string[] args)
    {
        BitArray t = GetArray();
        byte []byteArray = new byte[100000000/8];
        t.CopyTo(byteArray, 0);
        byte[] compressedData = Compress(byteArray);
        Console.WriteLine(compressedData.Length);
    }
    public static BitArray GetArray()
    {
        Random r = new Random();
        BitArray result = new BitArray(100000000);
        for (int i = 0; i < result.Count; i++)
        {
            if (r.NextDouble() > .5)
            {
                result.Set(i, true);
            }
        }
        return result;
    }
}

But the size of variable compressedData is 12515308. It's bigger then original array. Any Ideas?

May be need I another compressor?

Leonid
  • 81
  • 11
  • 3
    Compression works by assigning short codes to common sequences in the data and long codes to rare sequences. If the data is completely random, there are not a lot of sequences that occur frequently, so the result can end up being longer than the original. Solution: don't compress random data. Or if you have to, just use a PRNG (like Random) and store only the seed rather than the generated values ("procedural generation"). – dtb May 03 '12 at 20:12
  • Thx. I need use random data. I can not use GZip . May be another compressor. This is not exactly random numbers, but fop POC I use random . This is like users' encryption code. It's must be random. – Leonid May 03 '12 at 20:20
  • Compressing after encrypting is often a bad idea (you incur the compression overhead and there will be minimal, if any, actual size reduction for the reason @dtb mentioned). If possible try compressing the data before it's encrypted, that's when you'll get the best results. – carlosfigueira May 03 '12 at 20:41

2 Answers2

1

Have you tried not using random data? Data that compresses well is not random. I believe the common compression algorithms look for patterns of bits in order to compress. As a simple test, you could write out those random bytes into a file, and then see what happens when you zip it.

Seth Flowers
  • 8,990
  • 2
  • 29
  • 42
0

No compressor can compress truly random data. (As was pointed out, you can compress pseudo-random data if you can deduce the seed and the algorithm.)

What is your application? Do you have real data to test it with?

Mark Adler
  • 101,978
  • 13
  • 118
  • 158