2

In the GNU specs for POSIX tar archive format here, the header structure is defined as:

struct posix_header
{                              /* byte offset */
  char name[100];               /*   0 */
  char mode[8];                 /* 100 */
  char uid[8];                  /* 108 */
  char gid[8];                  /* 116 */
  char size[12];                /* 124 */
  char mtime[12];               /* 136 */
  char chksum[8];               /* 148 */
  char typeflag;                /* 156 */
  char linkname[100];           /* 157 */
  char magic[6];                /* 257 */
  char version[2];              /* 263 */
  char uname[32];               /* 265 */
  char gname[32];               /* 297 */
  char devmajor[8];             /* 329 */
  char devminor[8];             /* 337 */
  char prefix[155];             /* 345 */
                                /* 500 */
};

The size field of the header is defined as a char array of length 12, and the byte length of the field appears to be 12 bytes (inferred by the byte offset comment). This in theory provides 12 bytes (=96 bits) of space to store an unsigned integer. However, I suspect this is not the case.

  • Is the max size value equal to just 12 digits (999,999,999,999)? Or
  • Since this size value represents the number of bytes in the file, does that mean the size value might not be completely accurate since the data size might equal a number of bits that isn't divisible by 8? Or do files always get saved in increments of 8 bits (with unused bits padded out to fill an entire byte), and thus the data length of bits will always be divisible by 8?
Rafe
  • 8,467
  • 8
  • 47
  • 67

2 Answers2

2

According to the standard documentation

The name, linkname, magic, uname, and gname are null-terminated character strings. All other fields are zero-filled octal numbers in ASCII. For historical reasons, a final NUL or space character should also be used.

Therefore, 11 bytes give you 11 octal digits (0..777777777778, or 0..0x1FFFFFFFF range), which your program needs to convert to binary representation in a way that you find suitable - for example, like this:

uint64_t size;
sscanf(header->size, "%" SCNo64 "", &size);

Demo.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • does that mean that each byte contains an ascii character that has a character value of a digit between 0 and 7? and that the 11 characters represent an octal number (a number in base 8)? So you would take that base-8 number and convert it to base-10 to get the "normal" number of bytes that the file is? and thus 77777777777 (base-8) = 8589934591(base-10) would be the largest value? – Rafe Oct 25 '18 at 17:07
  • and that would be just under 8 GB in size? – Rafe Oct 25 '18 at 17:08
  • @Rafe Yes, each byte contains an ascii char representing a digit `'0'` through `'7'`, inclusive, with leading zeros to pad to length and a null terminator in the 12-th byte. 8589934591 (decimal) is the largest value, for the max size of 8GB. – Sergey Kalinichenko Oct 25 '18 at 17:13
  • cool. any insight into why they chose octal digits? isn't there plenty of room to store one digit of a larger number base in a byte? – Rafe Oct 25 '18 at 17:40
  • 1
    @Rafe I know that PDP-11 people loved octal because you could read machine code in base-8 with relative ease ([how?](https://stackoverflow.com/q/48726139/335858)). I suspect that's why octal representation has made it into C standard in the first place. The header has been developed by the same people in the late seventies, so my only guess is that they picked octal out of personal preference, rather than for any technical reason. – Sergey Kalinichenko Oct 25 '18 at 17:59
0

Each of the fields in the header are stored as null terminated strings. In the case of the file size, it is stored as an octal string.

So you have a total of 11 octal characters (leaving room for the null byte) meaning 33 bits for the file size, or up to 8GB.

dbush
  • 205,898
  • 23
  • 218
  • 273