28

I have to read a file encoded in UTF-16 using nodejs (in chunks because it is very large). The data from the file will go into a mongodb, so I will need to convert it into utf-8. From googling, it seems that this is just plain not supported by Node, and I will have to resort to converting the raw data from a buffer myself. But I also think there ought to be a better way and I'm just not finding it. Any suggestions?

Thanks.

Ryan Ballantyne
  • 4,064
  • 3
  • 26
  • 27

2 Answers2

45

Replace the normal utf8 you'd have when reading a text file with utf16le or ucs2:

var fileContents = fs.readFileSync('import.csv','utf16le')

or:

var fileContents = fs.readFileSync('import.csv','ucs2')

Also, for anyone searching the internet: anyone getting additional � (question mark) characters appearing in a parsed file, this is probably the cause of your problem. Read the file as UTF16/UCS2 and the extra characters will disappear.

mikemaccana
  • 110,530
  • 99
  • 389
  • 494
  • 6
    Additional notes: 'ucs2' is an alias of 'utf16le' (little-endian). You can also use 'utf16le' if that will make your intent clearer in that particular instance. `Buffer.prototype.toString()` also supports the same encoding parameters. If your source is big-endian, you need to first swap the bytes using `Buffer.prototype.swap16()` since Node.js currently doesn't have a 'utf16be' encoding. – Ates Goral Mar 19 '18 at 11:44
  • Thanks @ates that's a good suggestion, I've updated the answer to include it. – mikemaccana Mar 20 '18 at 12:26
25

Node supports UCS-2, the UTF-16 subset supported by JavaScript. Try using that.

See this pull request.

Matthew Ratzloff
  • 4,518
  • 1
  • 31
  • 35