0

I have read from wikipedia (Chunk based formats) on what a chunk based data format is but have a few questions to clarify in regards to where the "header" (non-data) part of the file lives.

I can think of two approaches, one where there is a single header which describes where every chunk lives and from wikipedia the information to achieve this can be by

start- and end-markers of some kind, by an explicit length field somewhere, or by fixed requirements of the file format's definition

which would presumably all live in the header. such as:

Header

Number of entries: 2
Byte of each element: 1

Data

'H''E'

Followed by the data where each data is n-bytes long and there are number of entries of them.

I can see this as an advantage as the header can give you access to any element you want immediately. However the disadvantage is that each chunk is not self-contained.

The second approach which could be taken is to have a MAIN header which contains some but not all of the information and each chunk itself contains a (header, data) pair making it self contained.

Variable chunks based on IDs -- minimal header

Number of offsets (each are 64 bits wide)
Offset of chunk ID 1
Offset of chunk ID 2
Offset of chunk ID 3
(Note: if chunk ID 1 contains more mini-chunks then this is not noted here)

Element
chunk ID 1
Number of elements
Number of bytes per element
....Data....

Element
chunk ID 2
Number of elements
Number of bytes per element
....Data....

The number of bytes could also be encoded within the program reading the format as opposed to the file format based on the ID. This would then only need the ID and the number of elements which could also make the structures be recursive or hierarchical as each element can then be a variable size depending on what its own header says.

What is it that makes it chunk based?

Community
  • 1
  • 1
Har
  • 3,727
  • 10
  • 41
  • 75
  • Are you talking about HTTP Chunks here or is this just a theoretical question? – Matthias Aug 11 '15 at 12:38
  • It is just a theoretical question, which is kind of what are the basis of chunk based formats – Har Aug 11 '15 at 12:42
  • Maybe it helps you reading about the HTTP chunked transfer encoding: https://en.wikipedia.org/wiki/Chunked_transfer_encoding – Matthias Aug 11 '15 at 12:47
  • 1
    I think OP is more thinking along the lines of [RIFF](https://en.wikipedia.org/wiki/Resource_Interchange_File_Format) chunks, rather than HTTP. – Kninnug Aug 11 '15 at 12:48
  • @Kninnug yes that is right, however I didnt know http chunks existed – Har Aug 11 '15 at 13:17
  • This question asks for a highly opinion-based answer. I suggest you ask more specific questions at the end. "Which would you consider as being chunk based?" for example, in the Wikipedia article you linked you'll see opinions vary, e.g. XML is considered a chunk-based file format by some (as it's an example in the article), but not by me. – Veltas Aug 11 '15 at 14:09
  • But based on what criteria? That is what I am trying to find out... as I cant find a more formal definition :( – Har Aug 11 '15 at 14:14
  • 1
    (Off topic: I would be interested in why people need formal definitions. The inventor call it whatever he or she wants? Just in case you create a protocol or file format :) – Matthias Aug 11 '15 at 14:32
  • but there's a family of these things and the word the inventor used is used everywhere. They must have a reason/relationship/similarities for that... – Har Aug 11 '15 at 15:00
  • 1
    "which would presumably all live in the header" - No. Sart and end markers may surround the chunk itself. Also, each chunk may have its own header ("chunk header"). You ask, IIUC, essentially "what makes a chunked format a chunked format" - I think the answer is as simple as "that it divides the content into chunks". – davmac Aug 11 '15 at 15:49

0 Answers0