185

I need to read a file which is encoded with ISO-8859-1 (also called latin1), something like this:

var file_contents = fs.readFileSync("test_data.html", "latin1");

However, Node complains about "latin1" or "ISO-8859-1" not being a valid encoding ("Error: Unknown encoding").

What encodings does readFileSync accept?

Mark Amery
  • 143,130
  • 81
  • 406
  • 459
dsign
  • 12,340
  • 6
  • 59
  • 82
  • I don't know if it's an option to convert the source file to utf8. If so, here is a link. https://superuser.com/questions/762473/ansi-to-utf-8-in-notepad – mathheadinclouds Jun 08 '17 at 21:21

4 Answers4

285

The list of encodings that node supports natively is rather short:

  • ascii
  • base64
  • base64url (Node v14+)
  • hex
  • ucs2/ucs-2/utf16le/utf-16le
  • utf8/utf-8
  • binary/latin1 (ISO8859-1, latin1 only in node 6.4.0+)

If you are using an older version than 6.4.0, or don't want to deal with non-Unicode encodings, you can recode the string:

Use iconv-lite to recode files:

var iconvlite = require('iconv-lite');
var fs = require('fs');

function readFileSync_encoding(filename, encoding) {
    var content = fs.readFileSync(filename);
    return iconvlite.decode(content, encoding);
}

Alternatively, use iconv:

var Iconv = require('iconv').Iconv;
var fs = require('fs');

function readFileSync_encoding(filename, encoding) {
    var content = fs.readFileSync(filename);
    var iconv = new Iconv(encoding, 'UTF-8');
    var buffer = iconv.convert(content);
    return buffer.toString('utf8');
}
Codebling
  • 10,764
  • 2
  • 38
  • 66
phihag
  • 278,196
  • 72
  • 453
  • 469
  • 1
    For those having problems compiling iconv I recommend [iconv-lite](https://github.com/ashtuchkin/iconv-lite). It does not require compilation and according to the dev is faster than iconv and it is used by popular tools like Grunt, Nodemailer, Yeoman, ... – Telémako Jan 13 '14 at 11:32
  • To use with Electron install lib and save as production dependency: `npm install iconv-lite --save` – DenisKolodin May 30 '16 at 11:13
  • for now, `latin1` is also included as alias of `ascii` – thybzi Oct 24 '16 at 21:26
  • 4
    What you said in the answer -- that ISO-8859-1 (aka `latin1`) is not supported -- is not the case for Node v8.1.4. See [this documentation page](https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings) to see all the encodings supported by Buffer, which include `latin1` (whose alias is `binary`). – nbkhope Jul 13 '17 at 00:27
  • @phihag I went through the previous version's docs and it looks like `latin1` was added to the documentation ever since Node v6.4.0 https://nodejs.org/dist/v6.4.0/docs/api/buffer.html#buffer_buffers_and_character_encodings – nbkhope Jul 13 '17 at 19:15
  • It would be good if it was more explicit in the docs for readFile / readFileSync. `` isn't giving away much. – contrebis Aug 15 '19 at 12:13
  • 1
    I find it curious that according to https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings encoding-names 'latin1' and 'binary' MEAN THE SAME THING. Why would that be? I got bit by this because I assumed that 'binary' would mean the same thing as "no encoding" – Panu Logic Sep 14 '19 at 21:37
  • 2
    @PanuLogic When you convert from a byte string to a character string, you need _some_ encoding. `latin1` is reversible – even if your input is an mp4 video file, the string is going to be valid (albeit meaningless), and converting back to bytes will yield the original bytes. Use a [`Buffer`](https://nodejs.org/api/buffer.html)/`Uint8Array` for binary data. – phihag Sep 14 '19 at 22:12
  • @ phihag Good to know, that latin1 can be used for that purpose. So I guess that if I read a file setting encoding -argument to 'latin1' and then write the resulting string into some other file the contents will be identical. Do I need to specify an encoding when writing the file? – Panu Logic Sep 16 '19 at 21:15
  • 1
    @PanuLogic If you write without specifying an encoding, strings are written as UTF-8. Therefore, it is *not* safe to read as latin1 and write without an encoding; this will create [mojibake](https://en.wikipedia.org/wiki/Mojibake). Instead, read with `{encoding: null}` (or don't specify an encoding). That way, you'll get a Buffer. Buffers are already binary data, so when writing the `encoding` argument is ignored for them. – phihag Sep 17 '19 at 07:56
  • I just want to add that the case is not sensitive, thus UTF-8, Utf-8, or uTf-8, etc. would select utf-8, and for other encoding apply the same. – Daniele Cruciani Jan 25 '23 at 10:12
  • It's not working: nodeJS is installed but I get the Error: "Uncaught Error. Error: Cannot find module 'iconv-lite'". It seams that my nodeJS don't know 'iconv-lite'. My node version is 'v18.16.1'. How to fix the error? – Inherent Aug 31 '23 at 20:19
  • 1
    @Inherent you need to install it, with `npm install iconv-lite`. – phihag Sep 01 '23 at 13:20
18

The encodings are spelled out in the buffer documentation.

Buffers and character encodings:

Character Encodings

  • utf8: Multi-byte encoded Unicode characters. Many web pages and other document formats use UTF-8. This is the default character encoding.
  • utf16le: Multi-byte encoded Unicode characters. Unlike utf8, each character in the string will be encoded using either 2 or 4 bytes.
  • latin1: Latin-1 stands for ISO-8859-1. This character encoding only supports the Unicode characters from U+0000 to U+00FF.

Binary-to-Text Encodings

  • base64: Base64 encoding. When creating a Buffer from a string, this encoding will also correctly accept "URL and Filename Safe Alphabet" as specified in RFC 4648, Section 5.
  • base64url (Node v14+): base64url encoding as specified in RFC 4648, Section 5. When creating a Buffer from a string, this encoding will also correctly accept regular base64-encoded strings. When encoding a Buffer to a string, this encoding will omit padding.
  • hex: Encode each byte as two hexadecimal characters.

Legacy Character Encodings

  • ascii: For 7-bit ASCII data only. Generally, there should be no reason to use this encoding, as 'utf8' (or, if the data is known to always be ASCII-only, 'latin1') will be a better choice when encoding or decoding ASCII-only text.
  • binary: Alias for 'latin1'.
  • ucs2: Alias of 'utf16le'.
Codebling
  • 10,764
  • 2
  • 38
  • 66
KyleMit
  • 30,350
  • 66
  • 462
  • 664
2

Starting at v12, Node supports these encoding values:

  • ascii
  • base64
  • hex
  • latin1
  • ucs2
  • utf16le
  • utf8

Node v14 and later add base64url coding.

Use this shell script to hackily extract the supported encodings from every version of Node:

for v in {12..19}; do echo v$v; curl -L https://raw.githubusercontent.com/nodejs/node/v$v.x/lib/buffer.js --silent | grep -A 100000 'const encodingOps' | grep -B 100000 -E '^}' -m 1 | grep -Eo '^  [^ :}]+' | grep -Eo '[^ ]+' | sort; echo ---; done
Codebling
  • 10,764
  • 2
  • 38
  • 66
0

If the above solution does not work for you it is may be possible to obtain the same result with the following pure nodejs code. The above did not work for me and resulted in a compilation exception when running 'npm install iconv' on OSX:

npm install iconv

npm WARN package.json portalServer@0.1.0 No README.md file found!
npm http GET https://registry.npmjs.org/iconv
npm http 200 https://registry.npmjs.org/iconv
npm http GET https://registry.npmjs.org/iconv/-/iconv-2.0.4.tgz
npm http 200 https://registry.npmjs.org/iconv/-/iconv-2.0.4.tgz

> iconv@2.0.4 install /Users/markboyd/git/portal/app/node_modules/iconv
> node-gyp rebuild

gyp http GET http://nodejs.org/dist/v0.10.1/node-v0.10.1.tar.gz
gyp http 200 http://nodejs.org/dist/v0.10.1/node-v0.10.1.tar.gz
xcode-select: Error: No Xcode is selected. Use xcode-select -switch <path-to-xcode>, or see the xcode-select manpage (man xcode-select) for further information.

fs.readFileSync() returns a Buffer if no encoding is specified. And Buffer has a toString() method that will convert to UTF8 if no encoding is specified giving you the file's contents. See the nodejs documentation. This worked for me.

markrboyd
  • 49
  • 4
  • 2
    The problem is that the Buffer assumes the data was already in utf8 if not specified, and will make no attempt to convert from latin1 to utf8. – bluesmoon Aug 03 '13 at 16:18