12

I have seen some of the scripts which are either dealing with archive or binary data or copy files (not using python default functions) use chunk or block or offset or buffer or sector.

I have created a Python application and few of the requirements have been met by external libraries (archival / extracting data) or binaries. I would like to dive deeper now to get those third party library features into my application by writing a module of my own. Now I would like to know what those terms mean and where I can get started. Is there any documentation for the subject above?

Any documentation relevant to those words on the Python programming language would also be appreciated.

Alfe
  • 56,346
  • 20
  • 107
  • 159
sundar_ima
  • 3,604
  • 8
  • 33
  • 52
  • the interpretation of those words will be slightly different in each implementation... but why not use an online technical computing dictionary to get the general idea? – isedev Feb 25 '14 at 08:26
  • 2
    All leading to one line explanations. But i want detailed explanation. – sundar_ima Feb 25 '14 at 09:10
  • but that's my point... the detailed explanation will differ for each implementation. A chunk/block/sector may mean something different depending on whether you're looking at a `tar`, `cpio` or `cab` archive, a Microsoft CBF, or whatever else. `offset` should (hopefully) mean the same thing, but it's relative to something (*offset from what*). – isedev Feb 25 '14 at 09:13

1 Answers1

29

Chunk is used for any (typically rather large) amount of data which still is only a part of any size of a whole, e. g. the first 1000 bytes of a file. The next 3000 bytes could be the next chunk.

Block is used for a fixed amount of data (typically technically determined) which typically is only part of a whole, e. g. the first 1024 bytes of a file. The next block would then also be 1024 bytes long. Also, sometimes not all of a block is used; the second and last block of a file of 1034 bytes is still 1024 bytes large, but only 10 bytes of it will be in use.

Offset is a positional distance, typically between the beginning of something and the position of interest; e. g. if the 23rd byte in a file of weather data stores the temperature, then the temperature's offset is 23 bytes. It can also be a shift of a data position, e. g. if something has gone wrong and now a file is corrupted, this can be because all bytes are shifted 32 bytes to the back (after inserting 32 zeros at the beginning or similar), then the whole file has an offset of 32 bytes.

Buffer is a piece of memory in which things are collected in order to process them as a whole when the buffer is full (or nearly full). A typical example is buffered output; here single characters are buffered until a line is complete, and then the whole line is printed to the terminal in one write operation. Sometimes buffers have a fixed size, sometimes they just have an upper limit.

Sector is like a block, a fixed size part of a whole, but related even more to a technical origin. The whole in this case often is a piece of hardware (like a hard drive or a CD), and typically sectors contain blocks.

Alfe
  • 56,346
  • 20
  • 107
  • 159
  • Thank you for the detailed answer. Is there any documentation relevant to this? – sundar_ima Feb 25 '14 at 15:39
  • 3
    I'm not aware of official documentation concerning these terms. Do not forget that computer science is a rather new field of expertise; a lot of the terms and words are used more or less on a conventional basis derived from what the programmers of the 50s, 60s, and 70s invented. There hasn't been a duration of centuries (as in most other sciences) in which theoreticians could chew every conceived idea over and over until a pabulum had been reached and the terms were solidified. To search for "official" citations is a bit like asking for references in youth slang terms. – Alfe Feb 25 '14 at 21:33