27

I am writing a JavaScript program that I am executing in the Node.js runtime environment.

I want to read binary data from a file into a buffer, but I keep running into issues.

Here is what I have done so far:
    $ cat test.js 
    > var fs = require('fs'),
    > binary = fs.readFileSync('./binary', 'binary').toString('binary');
    > process.stdout.write(binary.substring(0, 48));
    $ xxd binary
    00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
    00000010: 0300 3e00 0100 0000 0008 0000 0000 0000  ..>.............
    00000020: 4000 0000 0000 0000 10a0 0000 0000 0000  @...............
    $ node test.js | xxd
    00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
    00000010: 0300 3e00 0100 0000 0008 0000 0000 0000  ..>.............
    00000020: 4000 0000 0000 0000 10c2 a000 0000 0000  @...............
    00000030: 00                                       .

Notice how a 0xc2 byte is inserted at index 0x29 when reading with node. Why is that? I've stated binary encoding both to readFileSync and toString. I've also tried ascii but then I get a different and equally wrong result.

JΛYDΞV
  • 8,532
  • 3
  • 51
  • 77
Robert Larsen
  • 1,018
  • 2
  • 11
  • 18

2 Answers2

51

The 'binary' encoding is an alias for 'latin1', which you clearly don't want when reading non-character data.

If you want the raw data, don't specify an encoding at all (or supply null)*. You'll get a Buffer instead of a string, which you'd then want to use directly rather than using toString on it.

* (Some APIs [like fs.watch] also accept 'buffer', but it's not on the list of encodings and readFileSync doesn't say it does. [Thanks Patrick for providing the list link.])

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • 12
    Is it just me, or this absurd? – Nathan Chappell Oct 01 '20 at 21:42
  • 1
    @NathanChappell - It *seems* absurd on the face of it, but makes more sense when you get into the details. In some sense, `latin1` is an alias for `binary` rather than the other way around. Details in the [list of encodings](https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings) linked above and on [this MDN page](https://developer.mozilla.org/en-US/docs/Web/API/DOMString/Binary). – T.J. Crowder Oct 02 '20 at 06:45
13

Just to add some more information, the reason this is happening is because you're passing a string to stdout.write(), which is implicitly converted back into a Buffer before being written, and when you do that in the Node.js REPL with this particular substring at position 0x28 of your binary file, you get the behavior you described:

> new Buffer('\u0010\u00a0')
<Buffer 10 c2 a0>

So as @T.J.Crowder correctly suggested, here's how to fix your script:

var fs = require('fs'),
    binary = fs.readFileSync('./binary');
process.stdout.write(binary.slice(0, 48));

This also uses Buffer#slice() instead of String#substring().

Patrick Roberts
  • 49,224
  • 10
  • 102
  • 153