Premise
- I have a blob of binary data in memory, represented as a
char*
(maybe read from a file, or transmitted over the network). - I know that it contains a UTF8-encoded text field of a certain length at a certain offset.
Question
How can I (safely and portably) get a u8string_view
to represent the contents of this text field?
Motivation
The motivation for passing the field to down-stream code as a u8string_view
is:
- It very clearly communicates that the text field is UTF8-encoded, unlike
string_view
. - It avoids the cost (likely free-store allocation + copying) of returning it as
u8string
.
What I tried
The naive way to do this, would be:
char* data = ...;
size_t field_offset = ...;
size_t field_length = ...;
char8_t* field_ptr = reinterpret_cast<char8_t*>(data + field_offset);
u8string_view field(field_ptr, field_length);
However, if I understand the C++ strict-aliasing rules correctly, this is undefined behavior because it accesses the contents of the char*
buffer via the char8_t*
pointer returned by reinterpret_cast
, and char8_t
is not an aliasing type.
Is that true?
Is there a way to do this safely?