9

I am trying to write a UTF-16 encoded file using std::ofstream(). Even in binary mode writing "\n\0" is written as "\r\n\0". Sample code:

std::string filename = ...
std::ofstream fout(filename, std::ios_base::binary);
fout.write("\xff\xfe", 2);
fout.write("\n\0", 2);
fout.close();

The resulting file's hex data is:

ff fe 0d 0a 00

I must be doing something wrong. Any ideas to prevent the 0x0d being written?

I am using MS VisualStudio 2013.

Update: It inexplicably started working as expected. Chalk it up to ghosts in the machine.

Jason
  • 389
  • 2
  • 6

2 Answers2

1

You sent 4 bytes to be output. 5 were observed in the output.

You were somehow not using binary mode. There is no other way you could use .write(buf, 2) and .write(buf, 2) and get 5 bytes of output.

Likely, in messing/playing around with things, (as people always do when trying to figure out why odd behavior) something you changed caused it to actually assert binary mode.

If you were earlier attempting to output to either STDOUT or STDERR, it's entirely possible that windows was automatically adding the '\r' into the stream because STDOUT and STDERR are almost always text, and this could have been overriding your attempt to put it into binary mode. (No, really. No, you're using Visual Studio, this is a really. Yes, if you use cygwin this isn't true, but you're using VS.)

-5

That's by design. The \n character is converted to the EOL marker for your platform and so the ofstream::write function is correctly interpreting it. If you want to write a binary file, you can't use special text characters.

Clarification: I managed to create a bit of confusion over what the compiler is doing. Basically, the \n is a special character which means "EOL/End of Line" This is different depending on what platform your compiling on.

Now the write() function is taking an array of bytes to write to the stream. The C standard doesn't really differentiate between a string(technically no such thing in C) and an array of chars(or bytes), so it lets you get away with this. What is happening during compile time is that those lines are getting converted to something like this:

fout.write({255, 254, 0}, 2);   // "\xff\xfe"
fout.write({13, 10, 0, 0}, 2);  // "\n\0"
fout.close();
Jesse Weigert
  • 4,714
  • 5
  • 28
  • 37
  • Would passing `\x0A` (the line feed character) work? I think only `\n` has the special meaning, but I'm not sure. – SirGuy Sep 01 '15 at 20:14
  • Possibly? If you want to write UTF-16 files, you really should use the correct library functions(usually the wc functions) do to it and not try and hand-code UTF. The standard is complicated enough that you're likely not going to do it right. – Jesse Weigert Sep 01 '15 at 20:17
  • 1
    The new line translation to CR+LF happens in the runtime, not the compiler. And it is only supposed to do so in text mode, not binary. Using `fout.write("\x0a", 1)` has the same effect. – Jason Sep 01 '15 at 20:22
  • 1
    Where in the C or C++ Standards does it state the compiler interprets or converts `\n` to `\n\r` in string or character literals? The library might do it for text mode files but that's a far cry from the compiler interpreting characters as something they aren't. – Captain Obvlious Sep 01 '15 at 20:23
  • @JesseWeigert, this is VERY WRONG answer indeed. Compiler has nothing to do with converting \n to \n\r. – SergeyA Sep 01 '15 at 20:23
  • @SergeyA \n is a platform dependent, non-unicode character. You can't just stuff a zero after it and expect it to work. I think you might be right about this happening at runtime though. Regardless of when it happens, it's the correct behavior. – Jesse Weigert Sep 01 '15 at 20:25
  • @JesseWeigert, not sure what you are arguing about. All I said was that compiler is not involved in the conversion you are mentioning. It is still true. – SergeyA Sep 01 '15 at 20:28
  • Ahh.. I see. I mistyped my answer. \n doesn't get converted to \n\r, it gets converted to the platform specific EOL marker. Updated my answer to be more correct. :-) – Jesse Weigert Sep 01 '15 at 20:32
  • 4
    _"If you want to write a binary file, you can't use special text characters."_ - They're writing in binary mode so the translation should not occur. Your answer is still wrong based on the code and question. – Captain Obvlious Sep 01 '15 at 20:34
  • 1
    _"If you want to write a binary file, you can't use special text characters."_ - This is flat out wrong. How else are you going to write a binary file if you can't write certain characters? _"What is happening during compile time "_ - No, it is _not_ happening at a compile time. Even when the translation occurs at _runtime_ it only happens when the stream is opened in text mode, not binary. – Captain Obvlious Sep 01 '15 at 21:12