0

I zipped a large regular unix file (.dat) using tar -cvzf command . This file is of around 200 gb in size. After zipping it became 27gb in size. But while reading data in that zipped file i can see annonymous data added at start of file. Is this possible? I tried to unzip that file again and found that unzipped file has no such anonymous records.

Sudoshree
  • 5
  • 3

1 Answers1

0

The GNU tar command is free software. Please study its source code. Read of course its man page, tar(1).

Indeed, a tar archive starts with a header documented in header file tar.h. There is a POSIX standard related to tar.

See also Peter Miller's tardy utility.

Don't confuse tar archives with zip ones handled by Info-ZIP (so zip and unzip commands).

GNU zip -a compressor, the gzip program which can be started by tar, notably your tar czvf command- is also free software, and of course you should study its source code if interested.

Some Unix shells (notably sash or busybox) have a builtin tar.

I tried to unzip that file again and found that unzipped file has no such anonymous records.

AFAIK, most Linux filesystems try to implement more or less the POSIX standard -based upon read(2) and write(2) system calls, and they don't know about records. If you need "records", consider using databases (like sqlite or PostGreSQL) or indexed files (like GDBM) - both built above Linux file systems or block devices.

Read also a good textbook on operating systems.

Notice that "a large regular unix file" is mostly a sequence of bytes. There is no notion of records inside them, except as a convention used by other user-space programs thru syscalls(2). See also path_resolution(7) and inode(7).

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Thank you Basile for above info . I will go through tar.h file in my system to dig more into this. – Sudoshree May 12 '20 at 06:36
  • Thank you Basile for this answer. I get all details abt tar process in wiki page that you shared in above answer. Thanks :) – Sudoshree May 12 '20 at 07:08