0

I have a simple question yet I didn't manage to find a lot of information about it or understand it very well.

When I open a tarfile in python using the tarfile.open() method, how exactly are the files in the tarfile read? I have a tarfile with data on people, each person has his own folder and in that folder his data is divided between different folders.

Will the files be accessed depending on internal structure or is there another way to determine which file will be accessed next when I use tarfile.extractfile()?

Thank you in advance

Paul Rooney
  • 20,879
  • 9
  • 40
  • 61
Georgi Nikolov
  • 113
  • 3
  • 11

1 Answers1

1

Internal structure. tar stands for "tape archive", and the big design point is the ability to work sequentially with small RAM, while writing to (or reading from) a sequential-access IO device (also known as tape): loading everything into memory and then processing it in some specific order was not possible. Thus, files are extracted in the order they are found in the archive, by reading the archive in order.

Amadan
  • 191,408
  • 23
  • 240
  • 301
  • Ok thank you for the clarification. Does that mean that it will start reading the first file it finds in the first folder (alphabeticaly)? – Georgi Nikolov Jun 01 '15 at 08:40
  • Not alphabetically - in the order files are put into the archive. If and only if you created the archive by adding the files alphabetically, then they will come out alphabetically. – Amadan Jun 01 '15 at 08:50
  • To elaborate on @Amadan's (and probably stating the obvious): if you execute `tar tvf $TAR_ARCHIVE_PATH`, you'd see the order of files in the archive. – boardrider Jun 02 '15 at 12:11