0

So I currently am trying to implement the huffman alg and it works fine for decoding and encoding. However, I store the encoded data as follows.

The result of the encoding function is a list containing many strings made up of 0 and 1 and all are varying length.

If i'd safe them in a normal txt file it would take up more space, if Id store them how they are in a binary file it could be that for example an 'e' which would have the code 101 would be stored in a full 8 bits looking like '00000101' which is wasteful and wont take up less storage then the original txt file. I took all the strings in the list and put them into one string and split it into equal parts of length 8 to store them more effectively.

However if I wanna read the data now, instead of 0 and 1 I get utf-8 chars, even some escape characters.

I'm reading the file with fs.readFileSync("./encoded.bin", "binary"); but javascript then thinks it's a buffer already and converts it to a string and it gets all weird... Any solutions or ideas to convert it back to 0 and 1?

I also tried to switch the "binary" in fs.readFileSync("./encoded.bin", "binary"); to a "utf-8" which helped with not crashing my terminal but still is "#��C��Ʃ��Ԧ�y�Kf�g��<�e�t"

To clarify, my goal in the end is to read out the massive string of binary data which would look like this "00011001000101001010" and actually get this into a string...

Royal Foxy
  • 49
  • 4

2 Answers2

0

You can convert a String of 1s and 0s to the numerical representation of a byte using Number.parseInt(str, 2) and to convert it back, you can use nr.toString(2).

The entire process will look something like this:

const original = '0000010100000111';
// Split the string in 8 char long substrings
const stringBytes = original.match(/.{8}/g);
// Convert the 8 char long strings to numerical byte representations
const numBytes = stringBytes.map((s) => Number.parseInt(s, 2));
// Convert the numbers to an ArrayBuffer
const buffer = Uint8Array.from(numBytes);
// Write to file


// Read from file and reverse the process
const decoded = [...buffer].map((b) => b.toString(2).padStart(8, '0')).join('');

console.log('original', original, 'decoded', decoded, 'same', original === decoded);
Titus
  • 22,031
  • 1
  • 23
  • 33
  • I tried doing the following `[...fs.readFileSync("./encoded.bin", "binary")].map((b) => console.log(b.toString(2)));` which I hope was what you meant but the output is weirdly still the same => "2#ºCÿÙÆ©²Ô¦ÜyÆKfñg – Royal Foxy Apr 08 '22 at 16:56
  • @RoyalFoxy That won't work, you also have to encode the data the way I've illustrated in my answer. You'll have to do something like `fs.writeSync('./encoded.bin', buffer)` and then `[...fs.readFileSync('./encoded.bin', 'binary')].map((b) => b.toString(2).padStart(8, '0')).join('')` will work as expected. – Titus Apr 09 '22 at 04:14
  • I'm already saving it as a buffer... I think `fs.writeFileSync("encoded.bin", Buffer.from(encode(string, codes).join("").match(/.{1,8}/g), "binary"));` I also tried to make a test file now that would just safe a few 0 and 1 to a file and would read it out again but node thinks the three dots are an unexpected token `[...fs.readFileSync("./encoded.bin", "binary")].map((b) => console.log(b.toString(2)));` The three dots at the start – Royal Foxy Apr 09 '22 at 07:27
  • Instead of `[...fs.readFileSync("./encoded.bin", "binary")]` use `Array.from(fs.readFileSync("./encoded.bin", "binary"))`. It is not enough to save the data as binary, it matters how you encode it, if you don't encode it as in my example, this won't work. – Titus Apr 09 '22 at 11:20
  • The binary representation of the string `00000101` is different then the binary representation of an 8-bit unsigned integers that has its bits set to `00000101`. In the first case you'll have 8 values, in the second case you'll have just one. – Titus Apr 09 '22 at 11:25
0
var binary = fs.readFileSync("./binary.bin");
binary = [...binary].map((b) => b.toString(2).padStart(8, "0")).join("");
console.log(binary);

//Output will be like 010000111011010
ateqio
  • 1
  • Thanks for your answer. It could be improved by adding an explanation as to what your code is doing and why. – Mark Dec 07 '22 at 11:21