Why are wchar_t / unsigned short now distinct, but there is no analogous char / unsigned byte distinction?

Question

It just seems like "not of one mind" in the design here, because integer data and character data of 16 bits is now differentiable but integer and character data of 8 bits is not.

C++ has always had the only choice for 8-bit values a 'char'. But the feature of recognizing wchar_t as an official, distinct type from unsigned short enables improvements, but only for wide-string users. It seems like this is not coordinated; the language acts differently for 8-bit and 16-bit values.

I think there is clear value in having more distinct types; having distinct 8-bit char AND and 8-bit "byte" would be much nicer, e.g. in usage for operator overloading. For example:

// This kind of sucks...
BYTE m = 59;     // This is really 'unsigned char' because there is no other option
cout << m;       // outputs character data ";" because it assumes 8-bits is char data.
                 // This is a consequence of limited ability to overload

// But for wide strings, the behavior is different and better...
unsigned short s = 59;
wcout << s;      // Prints the number "59" like we expect
wchar_t w = L'C'
wcout << w;      // Prints out "C" like we expect

The language would be more consistent if there were a new 8-bit integer type introduced, which would enable more intelligent overloads and overloads that behave more similarly irrespective of if you are using narrow or wide strings.

Would it blow your mind if I told you that `wchar_t` is not required to be 16 bits? Or even to be 2 bytes in an assuredly `CHAR_BIT==8` environment? — Lightness Races in Orbit, Nov 07 '14 at 00:56
Or that `wchar_t`/`unsigned short` have always been distinct? — Mooing Duck, Nov 07 '14 at 00:58
I suppose it doesn't really matter how wide wchar_t is, the point is, it cannot be confused with an integer, unlike char. Also, char16_t and char32_t are things too. — VoidStar, Nov 07 '14 at 01:09
16-bit `wchar_t` sounds like a frightening alternate reality.. — Cubbi, Nov 07 '14 at 03:38
If you were designing the language from the ground up, with no concern for backward compatibility, there would be a lot to say for using (for one possibility) template-like notation, so you could have `int` or `char`, where `N` could be any power of 2 from 8 to at least 64. With only two names, you'd supply all of `char`, `short`, `int`, `long`, and `long long`, and it would be simple and systematic, so the distinction between integers and character types would be decoupled from the size. Hasn't happened, and given backward compatibility constraints, it probably never will either. — Jerry Coffin, Nov 08 '14 at 02:01

Lightness Races in Orbit · Accepted Answer · 2014-11-07T01:06:49.083

2

Yes, probably, but using single-byte integers that aren't characters is pretty rare and you can trivially get around your stated problem via integral promotion (try applying a unary + and see what happens).

It's also worth noting that your premise is flawed: wchar_t and unsigned short have always been distinct types, per paragraph 3.9.1/5 in C++98, C++03, C++11 and C++14.

edited Nov 07 '14 at 01:06

answered Nov 07 '14 at 00:57

Lightness Races in Orbit

378,754
76
643
1,055

1

Only today I wanted to output some bytes as hexadecimals and found myself looking at a text file of smiley faces and weird letters. – Neil Kirk Nov 07 '14 at 01:02
@NeilKirk: Yeah I get that occasionally. Not often enough or with great enough damage to warrant a whole new type, I don't think (omg can you imagine the backward-compatibility nightmare with a new `byte` keyword? `byte_t` perhaps but ew and why is that not the same as `char` and omg this is horrible) ... yeah, I just found a few reasons right there in the parenthetical. And besides all that, there has to be a very good reason to add something to the language, not the other way around. Though I'd be amazed if this hadn't already been proposed. Point is, there's no answerable question here. – Lightness Races in Orbit Nov 07 '14 at 01:04
1

The problem is where the integer type is a template parameter and I want to output the integers to streams. I suppose I could put + on them, didn't think of that. – Neil Kirk Nov 07 '14 at 01:05
@NeilKirk: That's what I do. Not ideal, I admit. – Lightness Races in Orbit Nov 07 '14 at 01:07
Thanks for the tip, this will really help me tidy up some code tomorrow. – Neil Kirk Nov 07 '14 at 01:08
I don't think 8-bit integers are rare, at least in the embedded stuff I do. I see them in structs a lot, and there are lots of byte arrays. Since they just added char16_t and char32_t, it feels like they can add new types whenever somebody feels like they'd be nice. – VoidStar Nov 07 '14 at 01:12
@VoidStar: "I don't think they're rare, at least not within the bubble of the scenarios that pop up (which are rare)." I paraphrase your vacuous statement. :) It's like saying "I don't think winning the lottery is rare, at least not within the group of people who have won the lottery." – Lightness Races in Orbit Nov 07 '14 at 10:38

Why are wchar_t / unsigned short now distinct, but there is no analogous char / unsigned byte distinction?

1 Answers1