2

When I write data to and from a buffer to save to a file I tend to use std::vector<unsigned char>, and I treat those unsigned chars just as bytes to write anything into, so:

int sizeoffile = 16;
std::vector<unsigned char> buffer(sizeoffile);

std::ifstream inFile("somefile", std::ios::binary | std::ios::in);
inFile.read(buffer.data(), sizeoffile); // Argument of type unsigned char* is incompatible
                                        // with parameter of type char*

The first argument of ifstream::read() wants a char pointer, but my vector buffer is unsigned char. What sort of cast is suitable here to read the data into my buffer? It's essentially a char* to unsigned char*. I can do with reinterpret_cast or a C-style cast, but this makes me think I'm doing something wrong as these are not very often recommended at all. Have I made the wrong choice of data type (unsigned char) for my buffer?

HolyBlackCat
  • 78,603
  • 9
  • 131
  • 207
Zebrafish
  • 11,682
  • 3
  • 43
  • 119
  • 3
    `reinterpret_cast` is the right option. A C-style cast would work too, but you normally want to avoid C-style-casting to pointers and references, for safety reasons. – HolyBlackCat Jun 19 '20 at 16:04
  • "I tend to use `std::vector< unsigned char >`" why not use `std::vector`? – Caleth Jun 19 '20 at 16:06
  • @HolyBlackCat OK, it's just that I've seen a lot of people saying that you're probably doing something wrong if you use reinterpret_cast, so that had me doubting. What's the danger with C-style casts with pointers and references? – Zebrafish Jun 19 '20 at 16:07
  • @Caleth Yes, I can, it doesn't make any difference as long as the data type is a single byte size, right? – Zebrafish Jun 19 '20 at 16:09
  • 1
    The danger of a C style cast is accidentally `reinterpret_cast`ing or `const_cast`ing when you meant to `static_cast` – Caleth Jun 19 '20 at 16:09
  • @Zebrafish Indeed, you are probably doing something wrong if you use `reinterpret_cast`. This is one of those cases where it is safe: casting between pointers of `unsigned char`, `char`, and `std::byte`. – Justin Jun 19 '20 at 16:09
  • 2
    @Zebrafish It's one of the few cases where it's the right tool. The problem with C-style casts to pointers/references is that you can inadvertently cast away cv-qualifiers (`const`, and less often `volatile`). Also, when dealing with inheritance and casting from base to derived and back, a C-style-cast functions as a `static_cast`; but if you accidentally C-style cast to an unrelated class, you silently get the behavior of a `reinterpret_cast`, instead of an error you'd get with a `static_cast`. – HolyBlackCat Jun 19 '20 at 16:13

2 Answers2

6

The safest thing to do will be not to use a cast directly, but to use a helper template that restricts itself to casting between types with compatible representations.

template<typename T, typename U>
treat_as(U* ptr) -> enable_if_t< is_same_type_v< remove_unsigned<T>, remove_unsigned<U> >, T >*
{ return reinterpret_cast<T*>(ptr); }

and then

inFile.read(treat_as<char>(&buffer[0]), sizeoffile); 

If someday the vector type changes to unsigned wchar_t, this invocation will fail while a reinterpret_cast will silently start doing the wrong thing.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
1

The similarity between char and unsigned char is a red herring here: for any trivially copyable type, you can reinterpret_cast its address to char* for filling via istream::read because char has special permission to alias any type. (Arguably it should work even for types like std::tuple<int> with trivial copy constructors but non-trivial copy assignment operators, but the standard doesn’t promise that. On the other hand, pointers are trivially copyable, but that doesn’t mean you can load pointer values from other executions!)

You have to use sizeof in general, of course; it might be wise to use it even if it’s 1 to protect against future type changes.

Davis Herring
  • 36,443
  • 4
  • 48
  • 76
  • That's true, but it seems to me there's a difference between "get me the raw content (bytes) of this binary file" and "treat this binary file as an array of type `T`, get them for me". The former is free of endianness and alignment concerns, and it seems to be what OP is after (but I could be mistaken, the question doesn't really emphasize this). – Ben Voigt Jun 19 '20 at 19:37
  • Of course, file streams are for formatted I/O, they don't actually provide raw binary file access, because the codecvt facet is always part of the processing chain. One implication of that is that you can no longer really trust direct mapping of the bytes onto types other than the one output by the codecvt (and signed/unsigned variations thereof) – Ben Voigt Jun 19 '20 at 19:39
  • @BenVoigt: The codecvt facet is even part of the `filebuf`, so formally there is no way to do “raw” access except by virtue of the “C” locale. – Davis Herring Jun 19 '20 at 21:52
  • Correct @Davis, iostreams (of which `filebuf` is part) *cannot* do "raw" access. They are a formatted I/O library. OS APIs naturally provide raw access and probably someday we'll have a more composable replacement for iostreams in the C++ specification that properly separates the different concerns. – Ben Voigt Jun 22 '20 at 16:35