1

I have implemented Huffman's algorithm in Node JS and it looks like this:

huffman.encode(inputFilename, outputFilename)
huffman.decode(inputFilename, outputFilename)

But I want to implement it like this:

inputStream.pipe(HuffmanEncoderStream).pipe(outputStream)
outputStream.pipe(HuffmanDecoderStream).pipe(inputStream)

And the problem is I need to read content of the source file twice. Firstly to create table of frequencies and Huffman's tree and secondary to exactly encode content. So is it possible to implement this task with Transform Stream?

P.S. with decoding there no problems

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • 1
    Not really, I think. You could try an [adaptive huffman coding](https://en.wikipedia.org/wiki/Adaptive_Huffman_coding), or you will have to make two passes. You could write that like `const code = await getHuffmanCode(getInputStream()); getInputStream().pipe(makeHuffmanEncoderStream(code)).pipe(outputStream);` – Bergi Sep 27 '20 at 19:58
  • No. Not with plain huffman, because for creating the encoding tree you must analyze the entire file and you cant start encoding without the encoding tree. You could of course make certain assumptions about the input file (for instance if you know it's a text in english language) and use kind of a "generic" encoding tree. – derpirscher Sep 27 '20 at 20:23

1 Answers1

0

Huffman's algorithm requires that you have all the data first in order to compute frequencies. However, nothing is stopping you from applying Huffman's algorithm to chunks of your data, which would allow streaming. If the chunks are large enough (100's of K to MBs), the overhead of transmitting a code description will be very small in comparison. If the data is homogenous, you will get about the same compression. If the data is not homogenous, your compression might even improve, since the Huffman codes would be optimized to the statistics of the data local to that chunk.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158