5

When connected to a remote BASH session via SSH (with the terminal type set to vt100), the console command line will soft-wrap when the cursor hits column 80.

What I am trying to discover is if the <space><carriage return> sequence that gets sent at this point is documented anywhere?

For example sending the following string

    std::string str = "0123456789"  // 1
                        "0123456789"
                        "0123456789"    // 3
                        "0123456789"
                        "0123456789"    // 5
                        "012345678 9"
                        "0123456789_"   // 7
                        "0123456789"
                        "0";

gets the following response back from the host (Linux Mint as it happens)

01234567890123456789012345678901234567890123456789012345678<WS><WS><CR>90123456789_01234567890
Jordan Running
  • 102,619
  • 17
  • 182
  • 182
J Evans
  • 1,090
  • 2
  • 16
  • 36
  • How is that column 80? It looks like column 58. When you write `` do you mean `0x20`? And what exactly is the code you write as ``? `0x0d`? Finally, how are you capturing the output? – rici Jul 11 '15 at 18:50
  • A reasonable point for clarification. What is hidden of course is the Bash command line prompt. In this case ```user01@acermint ~ $ ```. is '\r' and is '\n'. is indeed 0x20. The output is dumped when data is read from the SSH socket whilst connected to a Linux host – J Evans Jul 12 '15 at 12:15
  • So are you sending that data over the ssh connection to bash, and looking at how bash echoes it? If that is the case, what you are seeing is essentially the behaviour of the readline library. (Which is certainly not documented in that detail outside of code comments.) – rici Jul 12 '15 at 18:53
  • Yes, technically it is ```readline``` generating the content. But it too should be adhering to a specification no? – J Evans Jul 13 '15 at 09:46
  • I don't see why there needs to be a specification. It's not a standardized sequence; it just happens to "work" for some pragmatic definnition of "work" in certain applications. I rewrote my answer to try to make this more precise. – rici Jul 13 '15 at 15:58
  • I don't suppose this counts as documentation, exactly, but the thread does describe vt100 behaviour. From 1987: https://groups.google.com/d/topic/comp.terminals/ZGViXbhtWc4/discussion (Note that it is described as an "undocumented feature" when not being described as a bug.) – rici Jul 15 '15 at 00:27
  • Hi Rici. You've got an incredibly detailed response here. But I guess my core problem remains this: If this soft-wrap sequence is not a standard then any console app is going to have to have special knowledge about the remote connection. – J Evans Jul 20 '15 at 12:30
  • :I don't understand your issue, or your terminology, or something. What is it precisely that you want to do? Here, bash is the (remote) console app and xterm (say) is the (local) terminal emulator. They understand each other fine. What part do you want to replace? If you want to write a console app, use a library like readline or ncurses. If you want to write an emulator, make sure your terminfo description is accurate. – rici Jul 20 '15 at 13:34
  • A concrete example. A windows application uses OpenSSH to create a remote shell and connects as a VT100 terminal. How is said app expected to deal with this 'soft-break' quirk (terminology might not be 100% here but it is clearly not part of the VT100 spec)? Does this make sense? – J Evans Jul 24 '15 at 14:42
  • it is expected to respond as though it were emulating a vt100. The space character causes the cursor to advance to the next line, possibly scrolling, and then outputting a space, advancing the cursor to the next column. The carriage return then returns the cursor to the first column, and the next character will overwrite the space. What is the problem? – rici Jul 24 '15 at 14:49
  • Lin short, the "quirk" simply uses documented behaviour of the vt100 to achieve an output. If you expect to parse vt100 codes for *semantics*, then you are in for a rough ride, and you should probably present as a dumb terminal rather than a vt100. Ncurses and readline both play lots of games in an attempt to minimize data transmission, and that can make *interpreting*, rather than displaying, an exercise in frustration. – rici Jul 24 '15 at 14:54
  • ```it is expected to respond as though it were emulating a vt100```. Don't you see the contradiction here. This is precisely *not* documented in the VT100 specification. !!! – J Evans Jul 26 '15 at 18:05
  • It doesn't need to be documented. The "space return" sequence does nothing special; the behaviour of the space and return characters is documented, and the visible result of the sequence is precisely that of a space then a return. There is no such thing as a "soft break"; that is a figment of your imagination. It is not visible on the display; the VT100 does not react differently to a soft break than a hard break or, for that matter, a simple autowrap. So your hypothetical windows application does not need to know anything other than what the VT100 documentation provides... – rici Jul 26 '15 at 18:58
  • The imaginary "soft break" which you think you need to care about (or perhaps you want to detect) is a semantic superstructure which is not part of the VT100 communications. If you want to reverse engineer particular terminal libraries' idiosyncratic behaviours at the right-hand margin -- in which some will just autowrap while others will output a redundant space/return pair and others may do something even different, like output an explicit newline -- you are certainly within your rights, but you can't expect the VT100 docs to help you because *it doesn't matter to the display*. – rici Jul 26 '15 at 19:02

2 Answers2

12

The behaviour observed is not really part of bash; rather, it is part of the behaviour of the readline library. It doesn't happen if you simply use echo (which is a bash builtin) to output enough text to force an automatic line wrap, nor does it happen if bash produces an error message which is wider than the console. (Try, for example, the command . with an argument of more then 80 characters not corresponding to any existing file.)

So it's not an official "soft-wrap sequence", nor is it part of any standard. Rather, it's a pragmatic solution to one of the many irritating problems related to console display management.

There is an ambiguity in terminal implementation of line wrapping:

  1. The terminal wraps after a character is inserted at the rightmost position.

  2. The terminal wraps just before the next character is sent.

As a result, it is not possible to reliably send a newline after the last column position. If the terminal had already wrapped (option 1 above), then the newline will create an extra blank line. Otherwise (option 2), the following newline will be "eaten".

These days, almost all terminals follow some variant of option 2, which was the behaviour of the DEC VT-100 terminal. In the vocabulary of the terminfo terminal description database, this is called xenl: the "eat-newline-glitch".

There are actually two possible subvariants of option 2. In the one actually implemented by the VT-100 (and xterm), the cursor ends up in an anomalous state at the end of the line; effectively, it is one character position off the screen, so you can still backspace the cursor in the same line. Other historic terminals "ate" the newline, but positioned the cursor at the beginning of the next line anyway, so that a backspace would not be possible. (Unless the terminal has the bw capability.)

This creates a problem for programs which need to accurately keep track of the cursor position, even for apparently simple applications like echoing input. (Obviously, the easiest way to echo input is to let the terminal do that itself, but that precludes being able to implement extra control characters like tab completion.) Suppose the user has entered text right up to the right margin, and then types the backspace character to delete the last character typed. Normally, you could implement a backspace-delete by outputting a cub1 (move left 1) code and then an el (clear to end of line). (It's more complicated if the deletion is in the middle of a line, but the principle is the same.)

However, if the cursor could possibly be at the beginning of the next line, this won't work. If you knew the cursor was at the beginning of the next, you could move up and then to the right before doing the el, but that wouldn't work if the cursor was still on the same line.

Historically, what was considered "correct" was to force the cursor to the next line with a hard return. (Following quote is taken from the file terminfo.src found in the ncurses distribution. I don't know who wrote it or when):

# Note that the <xenl> glitch in vt100 is not quite the same as on the Concept,
# since the cursor is left in a different position while in the
# weird state (concept at beginning of next line, vt100 at end
# of this line) so all versions of vi before 3.7 don't handle
# <xenl> right on vt100. The correct way to handle <xenl> is when
# you output the char in column 80, immediately output CR LF
# and then assume you are in column 1 of the next line. If <xenl>
# is on, am should be on too.

But there is another way to handle the issue which doesn't require you to even know whether the terminal has the xenl "glitch" or not: output a space character, after which the terminal will definitely have line-wrapped, and then return to the leftmost column.

As it turns out, this trick has another benefit if the terminal emulator is xterm (and probably other such emulators), which allows you to select a "word" by double-clicking on it. If the automatic line wrap happens in the middle of a word, it would be ideal if you could still select the entire word even though it is split over two lines. If you follow the suggestion in the terminfo file above, then xterm will (quite reasonably) treat the split word as two words, because they have an explicit newline between them. But if you let the terminal wrap automatically, xterm treats the result as a single word. (It does this despite the output of the space character, presumably because the space character was overwritten.)

In short, the SPCR sequence is not in any way a standardized feature of the VT100 terminal. Rather, it is a pragmatic response to a specific feature of terminal descriptions combined with the observed behaviour of a specific (and common) terminal emulator. Variants of this code can be found in a variety of codebases, and although as far as I know it is not part of any textbook or formal documentation, it is certainly part of terminal-handling folkcraft [note 2].

In the case of readline, you'll find a comment in the code which is much more telegraphic than this answer: [note 1]

  /* If we're at the right edge of a terminal that supports xn, we're
     ready to wrap around, so do so.  This fixes problems with knowing
     the exact cursor position and cut-and-paste with certain terminal
     emulators.  In this calculation, TEMP is the physical screen
     position of the cursor. */

(xn is the short form of xenl.)


Notes

  1. The comment is at line 1326 of display.c in the current view of the git repository as I type this answer. In future versions it may be at a different line number, and the provided link will therefore not work. If you notice that it has changed, please feel free to correct the link.

  2. In the original version of this answer, I described this procedure as "part of terminal handling folklore", in which I used the word "folklore" to describe knowledge passed down from programmer to programmer rather than being part of the canon of academic texts and international standards. While "folklore" is often used with a negative connotation, I use it without such prejudice. "lore" (according to wiktionary) refers to "all the facts and traditions about a particular subject that have been accumulated over time through education or experience", and is derived from an Old Germanic word meaning "teach". Folklore is therefore the accumulated education and experience of the "folk", as opposed to the establishment: in Eric S. Raymond's analogy of the Cathedral and the Bazaar, folklore is the knowledge base of the Bazaar.

    This usage raised the eyebrows of at least one highly-skilled practitioner, who suggested the use of the word "esoteric" to describe this bit of information about terminal-handling. "Esoteric" (again according to wiktionary) applies to information "intended for or likely to be understood by only a small number of people with a specialized knowledge or interest, or an enlightened inner circle", being derived from the Greek ἐσωτερικός, "inner circle". (In other words, the knowledge of the Cathedral.)

    While the semantic discussion is, at least, amusing, I changed the text by using the hopefully less emotionally-charged word "folkcraft".

Community
  • 1
  • 1
rici
  • 234,347
  • 28
  • 237
  • 341
  • Agreed. But the question is where? In all other respects the shell emits the expected color (ESC[nn;nn;) and cursor positioning commands (Esc[A and Esc[K for example). – J Evans Jul 12 '15 at 12:23
  • Rici, thanks. A sterling effort here with your answers and comments. Even if the result is not quite what I'd have liked :) However too much of a contribution to go un-thanked. Thank you. – J Evans Jul 28 '15 at 13:29
  • The comment predates any involvement by Eric Raymond, and is found in the 4.2BSD termcap file (see [source of tctest](https://github.com/ThomasDickey/tctest-snapshots/blob/master/testing/bsd42.tc#L1307)). The actual wording used in ncurses comes from the terminfo file which Raymond incorporated from SCO in 1995. – Thomas Dickey Sep 27 '16 at 09:06
  • @ThomasDickey: thanks, that's interesting. Before I fix my answer, I wonder if you have any insight about the sentence which starts "The correct way to handle xenl is...". As far as I can see, that sentence was not in the BSD files, so it would have been added by ESR (or someone). For the purposes of this discussion, that sentence seems like the key part of the quote. – rici Sep 27 '16 at 13:51
  • "Or someone" (we don't know who, but it would have been one of the companies with copyright notices, or a contractor for one of those). – Thomas Dickey Sep 27 '16 at 20:11
  • @thomas: ok, esr reference expunged. – rici Sep 27 '16 at 21:54
2

There is more than one reason for making line-wrapping a special case (and "folklore" seems an inappropriate term):

  • The xterm FAQ That description of wrapping is odd, say more? is one of many places discussing vt100 line-wrapping.
  • vim and screen both take care to not use cursor-addressing to avoid the wrapping, since that would interfere with selecting a wrapped line in xterm. Instead (and the sample seems to show bash doing this too) they send a series of printable characters which step across the margin before sending other control sequences which would prevent the line-wrapping flag from being set in xterm. This is noted in xterm's manual page:

    Logical words and lines selected by double- or triple-clicking may wrap across more than one screen line if lines were wrapped by xterm itself rather than by the application running in the window.

  • As for "comments in code" - there certainly are, to explain to maintainers what should not be changed. This from Sven Mascheck's XTerm resource file gives a good explanation:

    ! Wether this works also with _wrapped_ selections, depends on ! - the terminal emulator: Neither MIT X11R5/6 nor Suns openwin xterm ! know about that. Use the 'xfree xterm' or 'rxvt'. Both compile on ! all major platforms. ! - It only works if xterm is wrapping the line itself ! (not always really obvious for the user, though). ! - Among the different vi's, vim actually supports this with a ! clever and little hackish trick (see screen.c): ! ! But before: vim inspects the _name_ of the value of TERM. ! This must be similar to "xterm" (like "xterm-xfree86", which is ! better than "xterm-color", btw, see his FAQ). ! The terminfo entry _itself_ doesn't matter here ! (e.g.: 'xterm' and 'vs100' are the same entry, but with ! the latter it doesn't work). ! ! If vim has to wrap a word, it appends a space at the first part, ! this space will be wrapped by xterm. Going on with writing, vim ! in turn then positions the cursor again at the _beginning_ of this ! next line. Thus, the space is not visible. But xterm now believes ! that the two lines are actually a single one--as xterm _has_ done ! some wrapping also...

The comment which @rici quotes came from the terminfo file which Eric Raymond incorporated from SCO in 1995. The history section of the terminfo source refers to this. Some of the material in that is based on the BSD termcap sources, but differs, as one would notice when comparing the BSD termcap in this section with ncurses. The four paragraphs beginning with the "not quite" are the same (aside from line-wrapping) with the SCO file. Here is a cut/paste from that file:

# # --------------------------------
#
# dec: DEC (DIGITAL EQUIPMENT CORPORATION)
#
# Manufacturer: DEC (DIGITAL EQUIPTMENT CORP.)
# Class:    II
# 
# Info:
#   Note that xenl glitch in vt100 is not quite the same as concept,
#   since the cursor is left in a different position while in the
#   weird state (concept at beginning of next line, vt100 at end
#   of this line) so all versions of vi before 3.7 don't handle
#   xenl right on vt100. The correct way to handle xenl is when
#   you output the char in column 80, immediately output CR LF
#   and then assume you are in column 1 of the next line. If xenl
#   is on, am should be on too.
#   
#   I assume you have smooth scroll off or are at a slow enough baud
#   rate that it doesn't matter (1200? or less). Also this assumes
#   that you set auto-nl to "on", if you set it off use vt100-nam 
#   below.
#   
#   The padding requirements listed here are guesses. It is strongly
#   recommended that xon/xoff be enabled, as this is assumed here.
#   
#   The vt100 uses rs2 and rf rather than is2/tbc/hts because the 
#   tab settings are in non-volatile memory and don't need to be 
#   reset upon login. Also setting the number of columns glitches 
#   the screen annoyingly. You can type "reset" to get them set.
#
# smkx and rmkx, given below, were removed. 
# smkx=\E[?1h\E=, rmkx=\E[?1l\E>,
# Somtimes smkx and rmkx are included.  This will put the auxilliary keypad in
# dec application mode, which is not appropriate for SCO applications.
vt100|vt100-am|dec vt100 (w/advanced video),

If you compare the two, the ncurses version has angle brackets added around the terminfo capability names, and a minor grammatical change was made in the first sentence. But the author of the comment clearly was not Raymond.

Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105
  • I think you're overreacting here. "Folklore" refers to the body of knowledge not codified in patents, formal standards, academic literature and other such canons. In the model of the cathedral and the bazaar, folklore is the knowledge base of the bazaar; while it may not be "respectable" in bourgeois terms -- like the practitioners of the bazaar -- it is no less valuable as a result, and oftentimes more so. You'll find many academic references to "folklore algorithms"; none of those that I have seen have implied that these algorithms are in any way less useful... – rici Jul 11 '15 at 22:27
  • The definitions which I've found don't make that distinction. – Thomas Dickey Jul 11 '15 at 22:36
  • Perhaps you are searching in the cathedral :-) Here's a possible search: http://www.google.com.pe/search?q=Folklore+algorithm . But if it really bugs you, suggest an alternative word for my definition and I'll happily edit. – rici Jul 11 '15 at 23:09
  • Thomas, thank you. That is one page I had *not* visited during my travels. Still light on specifics though. I may have to try and find some DEC specs. – J Evans Jul 12 '15 at 12:28
  • For the singularly committed the specifications here make no mention of this behaviour either: http://www.vt100.net/docs/vt220-rm/ – J Evans Jul 12 '15 at 13:07