I'm attempting to write a compressor with Huffman coding. The process involves using Bitarrays to store the values. All's fine and dandy until I load something slightly larger.
Currently I have the program load in a 93mb mp4 video. Part of the encoding process looks like this.
var encodedSource = new List<bool>();
var bitList = new List<BitArray>();
var listSize = 0;
foreach (var t in source)
{
var encodedSymbol = new bool[dictionary[t].Length];
dictionary[t].CopyTo(encodedSymbol,0);
encodedSource.AddRange(encodedSymbol);
if (encodedSource.Count > 1000000)
{
bitList.Add(new BitArray(encodedSource.ToArray()));
listSize += encodedSource.Count;
encodedSource = new List<bool>();
}
}
var bits = new BitArray(listSize);
var index = 0;
foreach (var bitArray in bitList)
{
foreach (var b in bitArray)
{
bits[index++] = (bool) b;
}
}
The encodedSource and bitList seems to taking far too much space then they should need to (Combined they take around 800mbs upon completion).
After the encoding is done, the bitList is copied into bits, and then a byte array, then finally the file. bits seems to be normal size, about 90mb, and the resulting file with headers and stuff at 91mb is normal too. I can't seem to figure out either why encodedSource and bitList takes so much space, or find some method that will save some space.
--- Explaining the code ---
I loaded the byte and conversion into dictionary to speed up the lookup (time went from 5 min to 69 seconds) bitList exists because just saving it into encodedSource takes way too much space, copying it into bitList takes about half the memory, still mores than 1/8th of what it should actually take, but less.
Edit: Didn't realize I didn't actually put in a question. Question is, why does it take so much space? and what can I do to mitigate that?
Also, I have thought about simply writing directly into the file every X bits, but I haven't gotten around to that yet, I'd like to solve this problem before getting there, but I can do that if needs be.