How to know the files inside the Tar parser

Question

I am developing a visual c++ application . i need to know the file type (i mean whether it contains .png file or.html file or .txt file) present inside the tar file(just by c++ prgramming)-Nothing to deal with the commands. I have got some knowledge on the link below- how to parse a tar file here i have got information that at buffer[512] we have contents of a file present inside thge tar file.My first quesion is

(1.) suppose if i have more then 1 files present in tar and i got the size from the location (&buffer[124], 11); and from 512 to size of the file i had conntents of that file and i stored it in a buffer for my personal use.But as i understand this rule of (contents start from 512 location) is valid for the file present at the first position in the tar file. What if i have to get the position, contents and size of the file which is at 3/4 positions(what if am not sure about the position of the file present in the tar file) ???

(2.) Am i thinking right ??? if i have to go to next file contents i have to do 512*2 (because first file contents starting at 512 location so the next file will be having at 1024- I am sure its a wrong approach but could any one please correct it ??).

Actually i have to store only Html file contents in my buffer from the tar file(which contains number of files of different type)

http://en.wikipedia.org/wiki/Tar_(computing)#Format_details – LS_ᴅᴇᴠ Jul 25 '13 at 15:46 — LS_ᴅᴇᴠ, Jul 25 '13 at 15:46

Ingo Leonhardt · Accepted Answer · 2018-02-16T15:13:52.787

2

The contents of a tar file is always header block, data block, header block, data block ... where every header block contains all the information of one file (filename, size, permissions,...) and the following data block contains the contents that file. The size of each data block is the next multiple of 512 of the file size as it is in the header block (that sentence looks awful to me. Could any native speaker correct is please). So if you have read one header block and want to skip to the next one calculate

 size_t skip = filesize % 512 ? filesize + 512 - (filesize % 512) : filesize

or, more performantly

 size_t skip = filesize + 511 & ~512;

and seek skip bytes forward.

For example if your tar file contains two files a.bin of size 12345 (next multiple of 512 is 12800) and b.txt of size 123 (next multiple of 512 is -- obviously -- 512) then you would have:

header containing information about a.bin starting at Pos. 0
data of a.bin starting at Pos. 512
header containing information about b.txt starting at Pos. 512 + 12800 = 13312
data of b.txt starting at Pos. 13312 + 512 = 13824
the file size of the tar file will be at least 13824 + 512 = 14324. In practice, you will generally find the tar file to be larger and the next 512 bytes at Pos. 14324 will be \0

edited Feb 16 '18 at 15:13

answered Jul 25 '13 at 15:45

Ingo Leonhardt

9,435
2
24
33

as i understand by header block you mean the place where we can get format details of the file and data block you mean the place where we have the actual contents corresponding to the header block. but what i don't understand is that if suppose i have got the contents of the file at 512 location to size of the file and that file is of the size (12346622- i mean very long) then can i get the other file contents at location 512*2=1024 because still at this adress i have the file contents of first file . have you understood what i mean to say ?? – Sss Jul 25 '13 at 15:54
I've edited the answer and included an example. Hope it's getting clear now. – Ingo Leonhardt Jul 25 '13 at 16:09
1

You can also calculate `skip` as: `size_t skip = filesize + 511 & ~512;` - saves on the number of operations a little bit. – Mats Petersson Jul 25 '13 at 17:03
@mats: sure, i've intentionally done in like that here, for it seems to me that the effect of 'rounding up' is a little bot more obvious. But I'll add your's as an alternative – Ingo Leonhardt Jul 25 '13 at 17:06
@Shekar **NO** believe me you will find *b.txt*'s header **behind** *a.bin*'s contents. – Ingo Leonhardt Jul 25 '13 at 17:09
i mean the data of file b will start at location 512 (a's header) + 12345 +512 (b's header) =13369. why ?? 13312 ?? and another question is that do you know any way to know know if .html file is present at which position in buffer actually i have to store its content in buffer – Sss Jul 25 '13 at 17:15
next multiple of 512! for file size 12345 you will get 12800 and 512 + 12800 = 13312. Could you clearify your second question please? Do you want to modify an existing tar file? – Ingo Leonhardt Jul 25 '13 at 17:19
Now i got the first question. the second question if i have so many files in tar files but i just want to have the contents (data) of html file present in the tar file and i want to store it in another buffer (just html file contents NOT OTHERS) do you know any mechanism to do so ? – Sss Jul 25 '13 at 17:25
1

parse the tar file as described here (eg. using `fread( ..., 512, ... )` and `fseek( ..., skip, SEEK_CUR )`) until you find the header of the file you're interested in. The next data block is your's (that's what is meant by "seek skip bytes forward") – Ingo Leonhardt Jul 25 '13 at 17:28
@Ingo why do we need to get the size as exact divisible of 512 as you have taken 12800 (25*512=12800 not 12345) why not the exact value obtained 12345 ??? – Sss Jul 26 '13 at 08:51
Just because that's how the format is defined. data is always written in 512 byte blocks. The reason for that have been storing devices like magnetic tapes (that's what the 't' in tar stands for) – Ingo Leonhardt Jul 27 '13 at 08:29
@Ingo there is a question suppose if i want to get a particular file(let's say HTML file innside tar file) how will i know that this fule is html file (using your algo only deals with skipping the files not with knowing the contents of a file).I mean what i have to do is to store the html file contents in a buffer which are present in tar file(and there are many other files also present in this tar file) – Sss Jul 29 '13 at 06:33
In the header block you will find the file name. If checking the extension doesn't suit your needs, you will have to read at least the first portion of the contents and 'guess' if it's HTML. There is no seperate 'file type' attributoe or something like that – Ingo Leonhardt Jul 29 '13 at 09:29
@Ingo here is my code please could you check once whats the problem .My "skip" is able to go to next file only not able to access the ".html" file. please help me- http://stackoverflow.com/questions/17920081/how-to-skip-a-file-inside-the-tar-file-to-get-a-particular-file – Sss Jul 29 '13 at 12:32
I am jus able to accessthe second file only not more then second file – Sss Jul 29 '13 at 12:38
I've found [the new question](http://stackoverflow.com/questions/17920081/how-to-skip-a-file-inside-the-tar-file-to-get-a-particular-file), look there – Ingo Leonhardt Jul 29 '13 at 12:52

How to know the files inside the Tar parser

1 Answers1

Linked