2

I know, from experience, that the following code:

#include <iostream>

int main()
{
    std::cout << "Hello World!\n";
    return 0;
}

results in different line-endings being printed on different platforms (e.g. Linux: LF, Windows:CRLF) and that I sometimes have to switch count to binary mode if I want specific behaviour. Likewise I know that with filestreams I open myself I have to be careful to specify text or binary mode for my desired line-ending behaviour.

However I'm struggling to find where this behaviour of converting \n to CRLF is actually documented!

I've looked in the C++ spec (specifically C++98 through to 22) and the various online references (e.g. cppreference.com) and can't find which class / library routine is responsible for *actually converting the \n into the platform specific line end`. (Also, don't ask ChaptGPT, it's happily making up quotes from the spec that don't exist)

Or to phrase it another way: Where is the behaviour of C++'s text-mode and binary-mode streams specified?

If it cannot be found in the C++ spec, then the question is: Is it inherited behaviour from C? If so where is that defined?

Or is this something that C just inherits from the platforms it runs on?

Pod
  • 3,938
  • 2
  • 37
  • 45
  • 1
    This is (IIUC) one of those things C++ inherits from C, so in that respect this might be a [tag:c] question? Also, cppreference [only has a note](https://en.cppreference.com/w/c/language/escape#Notes) explaining this so there might not even be anything normative; certainly, the actual conversion is *not* specified as it is platform-dependent and happens behind any OS API. – You May 15 '23 at 09:21
  • I've had a look in the C spec as well and can't find it either. Is this just a case of inherited behaviour over 30+ years that no-one has bothered to official write down?! – Pod May 15 '23 at 09:25
  • 1
    @Pod: The answer to questions like that is basically always "no". ;-) Some things about streams and stings is mentioned only in the intro chapters on Input/Output (stdio.h) and String handling (string.h), which made me miss some of that myself. Happens. ;-) – DevSolar May 15 '23 at 09:42
  • 1
    Some platforms don't even use a specific line end. For example, IBM mainframes store a character count with the string, and remove the `'\n'`. (Probably did that before C was designed :-) – BoP May 15 '23 at 09:50
  • "which specific API does this in the C++ stdlib" It isn't clear what you mean by this. Every function that reads from a file or writes to a file does this eventually. If you must know *how* it is done (e.g.. which lowest-level function they call to perform the translation) you need to read the source code of your C++ and/or C standard library implementations. There is no other way. – n. m. could be an AI May 15 '23 at 09:57
  • @Pod Extended my answer regarding your latest edit. Feel free to further specify what you're actually looking for if I didn't make it clear yet. – DevSolar May 15 '23 at 11:50
  • @n.m. For that (now removed) reference I was mainly after asking "how far down the stack do we have to go until we find something that actually intentionally documents this behaviour?" as I was keen to understand how much of this in historical baggage and how much is intentional. i.e. does C++ do it because the Unix IO api originally did this and it's just inherited it via most C implementations? – Pod May 17 '23 at 12:05
  • What exactly are you calling "historical baggage"? On Windows, CRLF is the de-facto and de-jure line ending now, not in any irrelevant history. – n. m. could be an AI May 17 '23 at 13:05

1 Answers1

4

From the C standard, 7.21.2 Streams, emphasis mine:

A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined. Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment. Thus, there need not be a one-to-one correspondence between the characters in a stream and those in the external representation. Data read in from a text stream will necessarily compare equal to the data that were earlier written out to that stream only if: the data consist only of printing characters and the control characters horizontal tab and new-line; no new-line character is immediately preceded by space characters; and the last character is a new-line character. Whether space characters that are written out immediately before a new-line character appear when read in is implementation-defined.

A binary stream is an ordered sequence of characters that can transparently record internal data. Data read in from a binary stream shall compare equal to the data that were earlier written out to that stream, under the same implementation. Such a stream may, however, hav e an implementation-defined number of null characters appended to the end of the stream.

C++ basically inherits this definition.

Referring to the edit of your question:

If no documentation can be found, then a substitute answer would be knowledge of which specific API does this in the C++ stdlib, C stdlib or various OS platforms.

The "API" you are looking for is opening the stream in text mode.

You write printf( "Hello Bob!\n" ) or std::cout << "Hello Bob!\n", and the library implementation does whatever conversion is necessary (which might not be limited to line endings).

DevSolar
  • 67,862
  • 21
  • 134
  • 209
  • Whilst the quotes from the C spec apply in the sense that they're saying "anything can happen!" I was wondering *which specific part of the text stack actually says 'I will swap \n for the platform appropriate line ending'*. This behaviour seems to mostly be some kind of "informal agreement" amongst all compiler writers based on the behaviour of some early implementations? e.g. Early MSVC converted \n to CRLF because they thought that was the most useful thing, so gcc followed suit when it was ported to Windows? – Pod May 17 '23 at 12:06
  • @Pod The standard does specify what is required from, and allowed for, a C implementation (compiler / library) to be considered "conforming", a.k.a. "not broken". It does so while taking great care *not* to infringe on platform / CPU specifics, so conforming implementations can exist on a wide range of platforms. If a *platform* has a specific convention, i.e. what to consider a "line ending" (LF? CR? CR/LF? LF/CR? RS? US? Null byte? ...), the above specification of "text mode can do conversion" allows for the implementation to translate between C space (`\n`) and whatever the platform does. – DevSolar May 17 '23 at 12:22
  • (ctd.) I.e., what the standard calls "external representation" is not within the scope or authority of the C standard. It is in the wide field of "implementation-defined" and "implementation-specific" behavior. Note that this may go way beyond just line endings. – DevSolar May 17 '23 at 12:25