3
string s;
while(getline(cin,s)){
    cout << "---" << endl
    for(auto c: s) cout << int(c) << endl;   
}
cout << "Exiting";

If my input is Ctrl+Z, then I press enter once, and my program exits immediately.

^Z
Exiting

If I enter a character before pressing Ctrl+Z, then I have to press enter twice, and my program does not exit.

s^Z

---
115
26

I had always interpreted Ctrl+Z as the EOF character. getline would continue until it reaches this character, at which point getline tests false and my program would exit. I'm curious why my program interprets Ctrl+Z as the substitute character 26, depending on whether there is a preceding character or not, and why it was necessary for me to press Enter twice in the second example?

Fabrizio
  • 7,603
  • 6
  • 44
  • 104
Silversonic
  • 1,289
  • 2
  • 11
  • 26
  • 1
    The Ctrl+Z at the start of the line is handled by the Windows console itself, but oddly it's only for a generic [`ReadFile`](https://msdn.microsoft.com/en-us/library/aa365467) call, not the specific `ReadConsole` call. In this case the read returns `lpNumberOfBytesRead` as 0, which is what the C/C++ runtime interprets as EOF. – Eryk Sun Aug 14 '17 at 11:48
  • 1
    Having to press enter twice in the second example is odd. It's keeping the SUB ("\x1a") character in the buffer and dropping everything after it, including the CRLF line ending. If it didn't keep SUB in the result it would almost be understandable, since the Windows C and C++ runtimes treat this character as an EOF marker. ISTM the correct behavior here should be to return from `getline` with whatever preceded SUB on the line, not to keep it in the buffer and continue reading. – Eryk Sun Aug 14 '17 at 12:08

1 Answers1

2

26 is code of ^Z on your platform , and ^Z is a EOF marker for terminal, that's true. Characters with codes less than 32 are control characters for ASCII -compatible platform, I hope you know that. 26 isn't a substitute character, it's actual control code, ^Z or some "bug" character are substitutes. getline reads input until EOL (end-of-line, designated as CR by ASCII) or EOF (end of file, end of stream, designated as SUB) is encountered in stream, so ^Z is read with the second call of getline. That behavior is absolutely correct.

It is defined by platform (or, more precisely, by terminal type) if characters are sent to input buffer immediately or after some flush command occurred. Usual cause of buffer flush is EOL character, that's your ENTER (CR - Carriage return). Tat's why program receives EOF after Enter in your case. Note that some platform use LF (line feed) as EOL, and some - a pair of LF+CR. C literal '\n' is to be correctly translated into particular EOL marker.

Note, that you can use different delimiter:

template< class CharT, class Traits, class Allocator > 
std::basic_istream<CharT,Traits>& getline( 
         std::basic_istream<CharT,Traits>& input,
         std::basic_string<CharT,Traits,Allocator>& str,
         CharT delim );

ASCII table with substitute Control+ : enter image description here

Swift - Friday Pie
  • 12,777
  • 2
  • 19
  • 42