How to work with uint8_t instead of char?

Question

I wish to understand the situation regarding uint8_t vs char, portability, bit-manipulation, the best practices, state of affairs, etc. Do you know a good reading on the topic?

I wish to do byte-IO. But of course char has a more complicated and subtle definition than uint8_t; which I assume was one of the reasons for introducing stdint header.

However, I had problems using uint8_t on multiple occasions. A few months ago, once, because iostreams are not defined for uint8_t. Isn't there a C++ library doing really-well-defined-byte-IO i.e. read and write uint8_t? If not, I assume there is no demand for it. Why?

My latest headache stems from the failure of this code to compile:

uint8_t read(decltype(cin) & s)
{
    char c;
    s.get(c);
    return reinterpret_cast<uint8_t>(c);
}

error: invalid cast from type 'char' to type 'uint8_t {aka unsigned char}'

Why the error? How to make this work?

`decltype(cin)` seems insane. You really want `std::istream &` there, nothing else. — Kerrek SB, Oct 05 '14 at 12:51
static_cast preserves numerical value by possibly using a runtime conversion if I am not mistaken, I want to preserve the exact bit-sequence (I am reading the MBR of a disk) — peterf, Oct 05 '14 at 12:56

Kerrek SB · Accepted Answer · 2014-10-05T13:21:59.947

The general, portable, roundtrip-correct way would be to:

demand in your API that all byte values can be expressed with at most 8 bits,
use the layout-compatibility of char, signed char and unsigned char for I/O, and
convert unsigned char to uint8_t as needed.

For example:

bool read_one_byte(std::istream & is, uint8_t * out)
{
    unsigned char x;    // a "byte" on your system 
    if (is.get(reinterpret_cast<char *>(&x)))
    {
        *out = x;
        return true;
    }
    return false;
}

bool write_one_byte(std::ostream & os, uint8_t val)
{
    unsigned char x = val;
    return os.write(reinterpret_cast<char const *>(&x), 1);
}

Some explanation: Rule 1 guarantees that values can be round-trip converted between uint8_t and unsigned char without losing information. Rule 2 means that we can use the iostream I/O operations on unsigned char variables, even though they're expressed in terms of chars.

We could also have used is.read(reinterpret_cast<char *>(&x), 1) instead of is.get() for symmetry. (Using read in general, for stream counts larger than 1, also requires the use of gcount() on error, but that doesn't apply here.)

As always, you must never ignore the return value of I/O operations. Doing so is always a bug in your program.

It is relevant to the questions because in the question comments use of static_cast is discussed but OP says it is unsuitable, as do you. But I don't understand why. — Neil Kirk, Oct 05 '14 at 13:51
@NeilKirk static_cast reinterpret_cast and dynamic_cast are used primarily for documenting programmers intent, the actual effect on the code might be the same in some but not in all cases — peterf, Oct 06 '14 at 18:13
@Kerrek What is the technical reason for the compiler error? Why can't I sedate the type-system and reinterpret a char as a uint8_t? Why do I need a whole pointer to get an a single byte out? — peterf, Oct 06 '14 at 18:21

Columbo · Answer 2 · 2014-10-05T13:22:14.210

A few months ago, once, because iostreams are not defined for uint8_t.

uint8_t is pretty much just a typedef for unsigned char. In fact, i doubt you could find a machine where it isn't.

uint8_t read(decltype(cin) & s)
{
    char c;
    s.get(c);
    return reinterpret_cast<uint8_t>(c);
}

Using decltype(cin) instead of std::istream has no advantage at all, it is just a potential source of confusion. The cast in the return-statement isn't necessary; converting a char into an unsigned char works implicitly.

A few months ago, once, because iostreams are not defined for uint8_t.

They are. Not for uint8_t itself, but most certainly for the type it actually represents. operator>> is overloaded for unsigned char. This code works:

uint8_t read(istream& s)
{
    return s.get();
}

Since unsigned char and char can alias each other you can also just reinterpret_cast any pointer to a char string to an unsigned char* and work with that.

In case you want the most portable way possible take a look at Kerreks answer.

"Since unsigned char and char can alias each other" can you explain? — Neil Kirk, Oct 05 '14 at 13:24
@NeilKirk According to [basic.lval]/10 you can access the stored value of an object through a glvalue of type `char` or `unsigned char`. So you can access any stored element of a `char` array through a glvalue of type `unsigned char`. You can also modify those elements through such a glvalue and access them with a `char`-glvalue again. — Columbo, Oct 05 '14 at 14:05

How to work with uint8_t instead of char?

2 Answers2