Suppose I am trying to represent the contents of a tar
file in a C++ struct. Each block of a tar
file can be a header (which in turn has 2 possible versions) or a payload, all in blocks of 512 bytes (padded for the headers). Each possible form of a 512-byte block would be similar to what's represented below (simplified):
+-------+------+-----+------+---------+-----------+-----------------------+
header_v1 -> | fname | mode | uid | size | ln_tp | ln_file |+++++++++++++++++++++++|
+-------+------+-----+------+---------+-----------+-------+--------+------+
header_v2 -> | old_header_data | ln_tp | ln_file | other | fields |++++++|
+---------------------------+---------+-----------+-------+--------+------+
payload -> | raw_data |
+-------------------------------------------------------------------------+
As you can see, there is some overlap in fields (such as ln_tp
and ln_file
) and the header_v2
also makes use of the header_v1
fields covered as old_header_data
. Finally, for header padding and including the actual files information, a raw_data
field is used.
I have created the following structures to model this (simplified as well to match the previous representation, array sizes will not be correct):
struct pre_posix_t {
// Pre-POSIX.1-1988 format
std::array<char, 100> fname;
std::array<char, 8> mode;
std::array<char, 8> uid;
std::array<char, 12> size;
fd_type_pre link_type; // fd_type_pre is an enum with the allowed values (char)
std::array<char, 100> link_name;
};
struct ustar_t {
std::array<char, 156> pre_posix; // first 156 bytes of Pre-POSIX.1-1988 format (thus excluding link_type and link_name)
fd_type_ustar link_type; // fd_type_ustar is an enum with the allowed values (char, extends fd_type_pre)
std::array<char, 100> link_name;
std::array<char, 8> other;
std::array<char, 32> fields;
// ...
};
using header_t = std::variant<pre_posix_t, ustar_t>;
using raw_block_t = std::array<char, 512>;
struct tar_t {
// ...
std::variant<header_t, raw_block_t> data;
};
using archive_t = std::vector<tar_t>;
Is this a good representation? What would be the idiomatic way of manipulating this data in C++? I'm worried about v2
's old_header_data
shadowing the v1
field values, and also the overlap of link_type
and link_file
for the two versions, and if std::variant
is the best way of working with that conditions in terms of offering a good API for manipulation while keeping the types right.
For example, if I were to construct a v2
header manually, how could I set fname
, mode
and also other exclusive v2
fields while working with a header_t
? Perhaps creating a pre_posix_t
, converting it to an std::array<char, 156>
with some conversion function, and later insert it as an ustar_t
's pre_posix
member?
As std::variant
is similar to an union
, should I expect v1
and v2
to be already padded to 512 bytes?