Character code pages: control code page assignment that means "the next rendered character (in this source code) is escaped?"

Question

I acknowledge this question may be unanswerable, or extremely difficult to answer.

Also, notwithstanding I expect this audience to be familiar with what escape sequences in e.g. scripting languages are, for reasons of clarity you'll see later in the post, I'll review that concept:

By "escaped," I mean for example printable characters which are interpreted as "Do not use the next character as usual; interpret it in another context." Contexts for this include characters intended not to be interpreted as code, but as literal printed characters, or conversely, characters which may usually be interpreted as literal characters which we want to interpret instead as code. My examples (more confusingly, I now realize) use the latter case.

Specific example: a regex used with 'nix sed, which, when not escaped for sed, is this:

([^0-9]*)(20[0-9]{2})([^0-9]{1,2})([0-9]{1,2})

But when escaped for the shell to pass the regex to sed such that sed knows to interpret the characters not as literal characters, but as regex code, the whole string becomes much uglier (and much less human-readable):

\([^0-9]*\)\(20[0-9]\{2\}\)\([^0-9]\{1,2\}\)\([0-9]\{1,2}\)

Escape characters (or sequences) are one of the banes of programming. This is especially true for long strings (or code lines), where it is only practical to either pay extreme attention and/or use tools that create and remove escape sequences.

I've looked around and not encountered a solution like what I'll propose, but not knowing what this may be named if it exists, and not being an expert, the search was futile.

Where I say things like "control code page assignment," I'm talking about code pages in the sense of tables of printable (and non-printable) characters that computers use to render and control the layout of text, etc., as explained in the wikipedia article on "Code Pages". You could (loosely) call these "computer alphabets," if you will. Where I say "code page assignment," I mean an entry in the computer's "alphabet" interpreted either as a rendered glyph (printable character) or unprinted control code (non-printable characters).

The idea is to designate a specific, unprinted control code page assignment to mean "interpret the next character as escaped," which the text renderer could "read" and indicate to the programmer by changing e.g. the color and/or brightness of the escaped character that follows the control code. And/or the control code page assignment could be a printable glyph, being for example a standardized, non-intrusive accent glyph which doesn't conflict with any other accents in any alphabets related to the Roman alphabet.

This unprinted code page assignment would also be read by interpreters and compilers similarly.

Suppose a rendered version of a longer regex than what I gave above:

If we had an unprinted code page assignment that means "the next character is escaped," the escaped characters could for example simply be rendered brighter, to indicate they are escaped:

That is far eisier for a human to interpret (albiet this is difficult to begin with as a regex) than the following, which instead uses printed characters for escape sequences:

The predominant if not universal situation as I write this is to use printed characters in escape sequences, not unprinted code page assignments.

Attendant problems to the proposed solution would be ensuring conformity to the escaped code page assignment by so many tools which programmers use. Programmers would also have to know which utilities support the escaped code page assignment and which don't. Also, it would be best for any tools adopting such a code page assignment to be explicit about whether they are backward compatible (whether they can use both printed characters and an unprinted code page assignment for escape sequences).

I would not prefer any programming language or tool that accomplished this by any means other than an escape control code page assignment. All the same, I'd be very curious about any tools that do this.

So after all of that, my question is: what programming languages exist that do this, and/or is there already a code page assignment that does this?

I forgot to mention another problem that makes me want this proposed solution: in some settings, different characters require different escape characters (DOS is horrible that way); this would eliminate the need to look up or remember which characters escapes which characters. However, leading off the answers given here, a configurable-enough editor could be given to know escape sequences (including different sequences for different characters) and pretty-print them; it could also be configured to suggest them. — Alex Hall, Feb 10 '16 at 00:20

comingstorm · Answer 1 · 2016-02-08T04:38:44.287

3

I'm not aware of any programming language that does what you're suggesting. The problem with storing your program in non-printable text format is that your users are then stuck with only using tools that understand that particular non-printable text format.

Also, suppose you settle on a particular, non-printable control character to indicate escaped characters. Then, how would you conveniently type them? If you need to type a special key to escape a character, you can just as easily make it a backslash. After all, you can pretty-print printable characters as easily as non-printable ones -- as long as you design your language syntax so that your smart editor can correctly identify which literals need to be pretty-printed.

edited Feb 08 '16 at 04:38

answered Feb 08 '16 at 04:32

comingstorm

25,557
3
43
67

1

Right, you mention the problem I described as "..ensuring conformity to the escaped code page assignment by so many tools.." You also mention a problem I meant to but forgot to: how would such a non-printable escape character be typed? That is another problem presented by this solution I propose. You remind me of another problem that makes me desire my proposed solution, which I'll mention in a comment on my question. And yes, as svick also mentions, a configurable-enough editor could pretty-print escape sequences. – Alex Hall Feb 10 '16 at 00:17

score 2 · Accepted Answer · answered Feb 07 '16 at 22:33

As far as I'm aware, pretty much all programming languages stick to printable ASCII characters*.
There already is a special escape control character in ASCII, called, unsurprisingly, Escape or ESC (the similarity to the Esc key is not accidental), code 27 or 0x1B. But this character is not really used this way anymore.
I think you could get pretty close to what you want with just syntax highlighting.
If you're willing to break the direct correspondence between bytes in the file you're editing and characters you see on the screen, then I think \ can stay being the escape character. You just need to find an editor that's configurable enough and configure it the way you want.

* The two main exceptions I can think of are not interesting here: APL with its own set of symbols and languages supporting Unicode in identifiers.

"languages supporting Unicode in identifiers": you mean like Java, C#, VB.NET, JavaScript, …. — Tom Blodget, Feb 07 '16 at 23:14
Thx. I'll be waiting a few days for the chance that very many eyeballs have perused this before I mark an accepted answer (but I'll be very surprised if it turns out any programming language does this). Syntax highlighting could work well if I also find a way to configure it to e.g. slightly vary the color of characters which the editor "thinks" may be unescaped characters, and (similar to what I wrote) fairly brightens both suspected unescaped characters and certainly escaped characters. — Alex Hall, Feb 08 '16 at 06:58

Character code pages: control code page assignment that means "the next rendered character (in this source code) is escaped?"

2 Answers2