-1

In C language, I am trying to read a Binary file in UTF16 format.

I tried to this;

binaryFile = fopen("data.dat", "rb, ccs=UTF16LE");

And it did not work. I need to do this without using the UTF16 reading library specifically, and I can't think of anything other than this solution.

Can you help me, thanks in advance.

bebibabi
  • 29
  • 2
  • 1
    Remove the `, ccs=UTF16LE` — unless the manual page for `fopen()` on your machine says it is supported. – Jonathan Leffler Apr 23 '22 at 12:52
  • 1
    I'd use libicu for working with UTF-16 encoded text, including I/O. – Shawn Apr 23 '22 at 12:55
  • 3
    UTF-16 is a text encoding - so if it's in UTF-16, it's not really a "binary file" in the normal meaning of the term... it's pretty unclear what you're trying to do, at the moment. – Jon Skeet Apr 23 '22 at 12:59
  • @JonathanLeffler my machine is supporting when I write fopen("data.dat", "rb") , but it is reading the file in UTF8 format. – bebibabi Apr 23 '22 at 13:10
  • @JonSkeet sorry I'm new in here, I've got a binary file. I will read it and write it in a xml file. But I have to do it with UTF-16 – bebibabi Apr 23 '22 at 13:12
  • You'll need to show a lot more code showing how you read the file and a hex dump of the first 64 bytes or so of the file (`xxd -ug1 data.dat | sed 4q`), and then explain how you deduce that the file is read as UTF8 even though it is UTF16LE. – Jonathan Leffler Apr 23 '22 at 13:12
  • 1
    What do you mean by "I will read it and write it in a xml file"? Please read https://codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-question/ and edit your question to be *much* clearer about what you're trying to do, the code you've got, and what's happening at the moment. – Jon Skeet Apr 23 '22 at 13:20

1 Answers1

2

According to the Microsoft documentation for fopen, they support a non-standard extension to specify the text encoding, but you misspelled it. It should be:

binaryFile = fopen("data.dat", "rb, ccs=UTF-16LE");

This is a typical example of the Embrace, Extend and Extinguish strategy deployed by this company in an attempt to lock developers in their wall garden.

Making the ccs flag case sensitive and using dashes are regrettable design choices. Lagging on UTF-8 adoption and encouraging 16-bit text file support is plaguing programmers to this day. Try your best to have the format of this file changed to UTF-8.

chqrlie
  • 131,814
  • 10
  • 121
  • 189