How can I read a file encoded in utf-16 in nodejs?

Question

I have to read a file encoded in UTF-16 using nodejs (in chunks because it is very large). The data from the file will go into a mongodb, so I will need to convert it into utf-8. From googling, it seems that this is just plain not supported by Node, and I will have to resort to converting the raw data from a buffer myself. But I also think there ought to be a better way and I'm just not finding it. Any suggestions?

Thanks.

It should supports ucs2, which is pretty close to UTF-16. – Ryan Olds Jun 07 '12 at 22:31 — Ryan Olds, Jun 07 '12 at 22:31

mikemaccana · Answer 1 · 2021-09-17T12:15:11.253

45

Replace the normal utf8 you'd have when reading a text file with utf16le or ucs2:

var fileContents = fs.readFileSync('import.csv','utf16le')

or:

var fileContents = fs.readFileSync('import.csv','ucs2')

Also, for anyone searching the internet: anyone getting additional � (question mark) characters appearing in a parsed file, this is probably the cause of your problem. Read the file as UTF16/UCS2 and the extra characters will disappear.

edited Sep 17 '21 at 12:15

answered Oct 31 '14 at 14:15

mikemaccana

110,530
99
389
494

6

Additional notes: 'ucs2' is an alias of 'utf16le' (little-endian). You can also use 'utf16le' if that will make your intent clearer in that particular instance. `Buffer.prototype.toString()` also supports the same encoding parameters. If your source is big-endian, you need to first swap the bytes using `Buffer.prototype.swap16()` since Node.js currently doesn't have a 'utf16be' encoding. – Ates Goral Mar 19 '18 at 11:44
Thanks @ates that's a good suggestion, I've updated the answer to include it. – mikemaccana Mar 20 '18 at 12:26

score 25 · Accepted Answer · answered Jun 07 '12 at 23:41

25

Node supports UCS-2, the UTF-16 subset supported by JavaScript. Try using that.

See this pull request.

answered Jun 07 '12 at 23:41

Matthew Ratzloff

4,518
1
31
35

How can I read a file encoded in utf-16 in nodejs?

2 Answers2