I am trying to implement Huffman coding in C#. I have a problem with encoding large files as it takes too much time. For example to encode a 11MiB binary file it takes 10 seconds in debug mode. And I did not even bother waiting for my program to finish with 27MiB file.
Here is the problematic loop:
BitArray bits = new BitArray(8);
byte[] byteToWrite = new byte[1];
byte bitsSet = 0;
while ((bytesRead = inputStream.Read(buffer, 0, 4096)) > 0) // Read input in chunks
{
for (int i = 0; i < bytesRead; i++)
{
for (int j = 0; j < nodesBitStream[buffer[i]].Count; j++)
{
if (bitsSet != 8)
{
bits[bitsSet] = nodesBitStream[buffer[i]][j];
bitsSet++;
}
else
{
bits.CopyTo(byteToWrite, 0);
outputStream.Write(byteToWrite, 0, byteToWrite.Length);
bits = new BitArray(8);
bitsSet = 0;
bits[bitsSet] = nodesBitStream[buffer[i]][j];
bitsSet++;
}
}
}
}
nodesBitStream
is a Dictionary<byte, List<bool>>
. The List<bool>
is a representation of path from Huffman tree root to the leaf node containing specific symbol represented as byte
.
So I am accumulating bits to form a byte which I write to a encoded file. It is quite obvious that this can take very long time but I have not figured out some other way just yet. Therefore I am asking for advice on how to speed up the process.