I have never done compression but am interested in the Huffman encoding. They show this as a simple demo encoding for the first few letters:
A 0
E 10
P 110
space 1110
D 11110
T 111110
L 111111
The standard Huffman encoding you see otherwise has a different set of codes, but it doesn't matter for this question. What I'm wondering is how to most efficiently manipulate these bits in JavaScript. It is said that you should deal with things in chunks of 8, 16, or 32, but nothing else really, because this is how integers and values are stored in the computer architecture. So the way I understand it is you should probably read 8-bit chunks of the input at a time. I'm not exactly sure how to do this, but I think if you did this it would work:
var bytes = new Uint8Array(array)
var byte1 = bytes[0]
var byte2 = bytes[1]
...
This seems like the most efficient way to access the data. But there is an alternative I'm thinking about which I wanted to clarify. You could instead just convert the input to a binary text string, so a string of 1's and 0's, as in
var string = integerOrByteArray.toString(2)
But the way I've learned, converting anything to string is a performance hit. So it seems you should avoid converting to strings if possible.
So if that's the case, then we are left with the first approach with Uint8Array
(or Uint32Array
, etc.). I'm wondering how you would then split the value into the component parts efficiently/ideally. So if we had this....
010110
AEP
....and we did our integer thing, then we might load some 8-bit integer like one of these:
01011000
01011001
00101100
...
So it's like, we need to join (potentially) any front data that might be part of the last 8-bit chunk, and then split the remaining for the characters. My question is basically what the recommended way of doing this. I can come up with ways of doing it but they all seem rather complicated so far.