Not sure if this fits here. If there is something like "Computer history and future" please direct me there.
Question
Since the rise of computers, were there any character encodings (or markup languages on top of that), that differentiate between uppercase and lowercase letters, but not by defining the entire alphabet twice (once in capitals and once in lowercase letters), but by adding a modifier or keyword that specifies a character to be in a specific case.
Why Would Someone Do This?
Maybe to encod text in less space, or simply because the authors considered the choice between ABC and abc more cosmetic than meaningful, which brings me to a lengthy and philosophical background explanation, see next section:
Skip everything from here if you are not interested in how I came up with this question.
Representation and Meaning
"Modern" encodings like ASCII and UTF-8 differentiate between uppercase and lowercase by assigning individual code points to each. This fundamental decision is so ubiquitous today, that concepts like case sensitivity appear rather natural to us. But when comparing Morse code, ASCII and Unicode, there are are a lot of distinctions that were traditionally stored in markup languages on top of the plain text encoding (e.g. rtf, tex, html, doc) but could be stored in plain text today:
- Letter casing ABC, abc
- Style ABC, , , ℂ, , ℭ,
- Decorations ABC, A̲B̲C̲, A̶B̶C̶, A̷B̷C̷
- Color
Very old encodings like Braille and Morse code do not encode letter casing, but ASCII does. In fact, it forces you to pick either capitals or lowercase letters. There is no definitive default style if you don't care.
Unicode and its UTF encodings often continued on that route by forcing you to differentiate not only between letter cases, but also between regular, italic, bold; sans-serif, serif; script, Fraktur; and more. But Unicode also supports modifiers. Instead of defining the entire alphabet again, only underlined/colored/..., there are combining characters that behave similar to keywords in markup languages. A special (sequence of) code points indicates that the next symbol should be underlined / have a different color / ... .
Unicode aims at encoding meaning and not representation. We have all these seemingly cosmetic variants in Unicode, because they convey a different meaning to someone. However, the more "meaningful" distinctions are made, the more I get a feeling that standardizing meaning without representation is impossible. Some examples:
Purely cosmetic representation that became standardized meaning
- Lowercase letters were invented as a script. If lowercase was invented today, we would simply call it a "Font" and consider it to be purely cosmetic.
- Mathematicians used bold letters for non-scalar variables. As writing bold by hand was tedious, some teachers drew only the outline of those bold letters, resulting in double struck blackboard bold fonts (ℂ). Nowadays, the styles N and ℕ imply very different things to mathematicians.
Standardized meaning that changed based on the representation
- You might be irritated at your grandma writing things like "Your grandpa died " because she misinterpreted the emoji as being sad. But let's be honest, do you really know the standardized meaning of emojis, or do you simply use them like the people around you, turning them into a mix of inside jokes and a full-blown cant. and might be popular emojis, but not because they are standardized as eggplant and peach. And if you are using to express anger, are you really better than your grandma?
An obscure mix of both
- The variant pi ϖ probably started out as a cursive/curly lowercase pi π, but might have been misread as an overlined lowercase omega ω and is therefore known as pomega and drawn more like an omega than a pi.
In an alternate universe ...
I wondered if history could have taken another turn, where people looked at these problems and thought: You know what? We cannot tell cosmetics and meaning apart. So lets try to create an encoding for the plainest of plain texts where you cannot even distinguish between uppercase and lowercase. Then add another encoding or markup language on top, that offers tons of modifiers or keywords to express whatever cosmetics you like.
In such a world, "plain text" could mean something like "a sequence of "regular" keystrokes" where computer keyboards send standardized and internationally unique scan codes.