0

Unicode has characters for START OF HEADING (␁ U+0001), START OF TEXT (␂ U+0002), END OF TEXT (␃ U+0003), and END OF TRANSMISSION (␄ U+0004). What's confusing about this is that, while there is a START OF HEADING character, there is no END OF HEADING character, and while there is an END OF TRANSMISSION character, there is no START OF TRANSMISSION character.

Where are these missing characters?

How should I go about representing the start of a transmission, or the end of a heading, using Unicode?

If the answer is "just use START OF HEADING in place of START OF TRANSMISSION," then what should I do if my "transmission" doesn't have a "heading"?

If the second part of the answer is "just use START OF TEXT in place of END OF HEADING," what happens if there is something between the heading and the text?†

† I can't imagine that this happens often (if ever), but I'm asking just in case someone out there ever tries to put something between the end of the heading and the start of their text.


Stack Exchange doesn't have a Unicode site, so I'm posting this here. If someone thinks that it would fit better on one of the other Network sites, please let me know in the comments.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Ben Zelnick
  • 433
  • 4
  • 14
  • The control characters you refer to are commonly used in binary communication protocols on an abstraction level unaware of the notion of characters and their encoding. If these protocols transfer suitably encoded Unicode they handle the data as opaque payload. Other protocol layers will handle the payload as text. The inclusion in in the ASCII and Unicode charsets is due to historical reasons (in 'prehistoric' times they were used to control the communication channel or the target device, effectively multiplexing a control and a payload stream ). – collapsar Jun 28 '22 at 00:29
  • Hi, I commented under your MSE post and just wanted to give you a quick FYI... If you're interested consider: [2021: a year in moderation](https://meta.stackoverflow.com/q/415250) – bad_coder Jul 10 '22 at 21:32

1 Answers1

3

The characters U+0000 to U+001F are imported directly from ASCII. If it didn't exist in ASCII, it doesn't exist in Unicode, in that range.

Most are obsolete; in-band delimiters are not so much used nowadays. If you're using an existing protocol with in-band delimiters, it'll have rules based on ASCII usage; if you're designing a new protocol, there are probably better ways to proceed.

As far as I recall, there's no need for end-of-header in typical usage, because that's coincident with start-of-text. There's presumably no need for start-of-transmission because the first thing you receive is the start of transmission, after synchronization (start bits in async disciplines, SYN in sync).