How can I extract the size of the total uncompressed file data in a .tar.gz file from command line?
7 Answers
This works for any file size:
zcat archive.tar.gz | wc -c
For files smaller than 4Gb you could also use the -l option with gzip:
$ gzip -l compressed.tar.gz
compressed uncompressed ratio uncompressed_name
132 10240 99.1% compressed.tar

- 19,451
- 13
- 99
- 197

- 1,135
- 1
- 7
- 4
-
2This gives me the size of the tar file including file meta data such as file names etc. I was looking for a way to only check the total size of the files. Anyway, the only way to do this seem to be to extract the tar-file and run a script on the extracted content. – Ztyx May 01 '10 at 11:46
-
1Actually, this could be enough. You will also need space for folder inodes, which can vary for different filesystems. Also `tar -tf...` with counting real size **will run gzip -d** on full file, thus you will actually extract tar. **gzip -l** stated here will not extract, so it is quite fast. – Vadim Fint Nov 14 '12 at 11:01
-
2In my case, this gives me an uncompress size which is smaller than the compressed and a negative ratio. – lefterav Feb 27 '14 at 14:01
-
45Worth noting that the uncompressed size reported is modulo 2^32, which means this doesn't work for files greater than 4GB. Use this command instead: `zcat archive.tar.gz | wc -c` – nedned Mar 19 '14 at 01:30
-
-
3Thanks @nedned. I was wondering why a 2.9Gb tar.gz full of text data files was reporting a -36% compression ratio o_O. That seems like a silly bug. – naught101 Mar 25 '19 at 22:05
-
@naught101 It's a file format limitation and documented in the man page: "The gzip format represents the input size modulo 2^32" – sehe Dec 03 '20 at 23:32
This will sum the total content size of the extracted files:
$ tar tzvf archive.tar.gz | sed 's/ \+/ /g' | cut -f3 -d' ' | sed '2,$s/^/+ /' | paste -sd' ' | bc
The output is given in bytes.
Explanation: tar tzvf
lists the files in the archive in verbose format like ls -l
. sed
and cut
isolate the file size field. The second sed
puts a + in front of every size except the first and paste
concatenates them, giving a sum expression that is then evaluated by bc
.
Note that this doesn't include metadata, so the disk space taken up by the files when you extract them is going to be larger - potentially many times larger if you have a lot of very small files.

- 3
- 1

- 14,100
- 15
- 78
- 114
-
33Or a bit more concisely: `tar tzvf archive.tar.gz | awk '{s+=$3} END{print (s/1024/1024), MB}'`. – Rubens Mar 18 '14 at 02:17
-
Thanks, Rubens. This is perfect and simple. I did this for mine and it worked great: tar tzvf 20180731.tar.gz | awk '{s+=$3} END{print (s/1024/1024/1024) " GB"}'. I did have to put quotes around "MB" or "GB" to get that printed. – Tony B Aug 01 '18 at 20:31
-
Calculate top-level directories (and files) size: tar tzvf /tmp/root.tgz| sed 's/ \+/ /g' | cut -f3,6- -d' ' | cut -f1 -d'/' | awk '{ arr[$2]+=$1 } END { for (key in arr) printf("%s\t%s\n", key, arr[key]) }' – Ilya Sheershoff Oct 05 '18 at 12:52
-
I saw sizes of 0,0 which breaks the pipe. Adding an additional sed 's/./,/g' helps. This replaces comma by dot, and then summing up can work – falkb Nov 05 '21 at 11:12
-
@Rubens that is the best answer. OP want to know what is the size of the file ACCORDING to tar, not once you extract it because can be defective `tar: Unexpected EOF in archive` – Smeterlink Aug 30 '22 at 21:18
The command gzip -l archive.tar.gz
doesn't work correctly with file sizes greater than 2Gb. I would recommend zcat archive.tar.gz | wc --bytes
instead for really large files.

- 2,941
- 2
- 25
- 37
-
2I believe `gzip -l` doesn't work with file size greater than **4GB**, since gzip only uses 4 bytes to store the original file size. – kevin Mar 15 '15 at 09:10
-
1In looking at the source for gzip.c it appears to be a off_t which is a signed 4 byte value so max is 2GB. – swdev Mar 16 '15 at 18:24
-
6The gzip specification (https://www.ietf.org/rfc/rfc1952.txt) says the ISIZE field should be the original file size modulo 2^32, not sure why gzip uses a signed int... – kevin Mar 16 '15 at 19:11
-
1Listing files greater than 4 GiB was fixed in gzip 1.12 (2022-04), [release notes](https://lists.gnu.org/archive/html/info-gnu/2022-04/msg00003.html). – Fofola Aug 26 '22 at 15:12
I know this is an old answer; but I wrote a tool just for this two years ago. It’s called gzsize
and it gives you the uncompressed size of a gzip'ed file without actually decompressing the whole file on disk:
$ gzsize <your file>

- 18,169
- 13
- 73
- 107
-
What does it improve over piping to `wc`? Piping also works on-the-fly, I think. – mxmlnkn Feb 04 '19 at 13:27
-
@mxmlnkn It’s at least twice faster, sometimes even more. On two sample 12GB files with different compression levels (one with random data - 11GB compressed; one with repeated data - 18MB compressed) `zcat|wc -l` took 60s and 42s while `gzsize` took 29s and 15s. – bfontaine Feb 04 '19 at 14:08
I'm finding everything sites in the web, and don't resolve this problem the get size when file size is bigger of 4GB.
first, which is most faster?
[oracle@base tmp]$ time zcat oracle.20180303.030001.dmp.tar.gz | wc -c 6667028480 real 0m45.761s user 0m43.203s sys 0m5.185s
[oracle@base tmp]$ time gzip -dc oracle.20180303.030001.dmp.tar.gz | wc -c 6667028480 real 0m45.335s user 0m42.781s sys 0m5.153s
[oracle@base tmp]$ time tar -tvf oracle.20180303.030001.dmp.tar.gz -rw-r--r-- oracle/oinstall 111828 2018-03-03 03:05 oracle.20180303.030001.log -rw-r----- oracle/oinstall 6666911744 2018-03-03 03:05 oracle.20180303.030001.dmp real 0m46.669s user 0m44.347s sys 0m4.981s
definitely, tar -xvf is the most faster, but ¿how to cancel executions after get header?
my solution is this:
[oracle@base tmp]$ time echo $(timeout --signal=SIGINT 1s tar -tvf oracle.20180303.030001.dmp.tar.gz | awk '{print $3}') | grep -o '[[:digit:]]*' | awk '{ sum += $1 } END { print sum }' 6667023572 real 0m1.005s user 0m0.013s sys 0m0.066s

- 89
- 4
-
Headers? Your solution is way off depending on the file size and number of files. Try it against numerous files inside the archive instead of 2. Try it against smaller and larger tar.gz files. – B. Shea Apr 06 '20 at 16:17
A tar file is uncompressed until/unless it is filtered through another program, such as gzip, bzip2, lzip, compress, lzma, etc. The file size of the tar file is the same as the extracted files, with probably less than 1kb of header info added in to make it a valid tarball.

- 13
- 1
-
5There's a header of 512 bytes for each file inside the tarball, plus the inner files are padded to be a multiple of 512 bytes. This adds up to an average-case overhead of 768 bytes per file inside the tarball. – Sarah G Jan 09 '15 at 04:40
-
The point of tarballs is that they are smaller versions for transport just like zip files. – Nate T Dec 21 '20 at 12:13
-
@Nathan No, it's not. On the contrary, it was designed to have bigger data blocks as an average filesystem. TAR stands for tape archive, nowadays repurposed but still an archive file for bigger data blocks. And also nothing to do with transports, actually back then when it was designed modems did the compression. You can gzip TAR the same as you can gzip any other file. Tom's answer will give very useless size approximation, but it's the same method and the same size you get from 'gzip -l' answers and those have 66 and 27 votes while Tom got downvotes? Not fair. – papo Jan 14 '21 at 18:03
-
@papo My original comment was poorly worded, but the answer is still wrong. The size of a tar.gz file is not he same, and that is what the OP was asking about. I wrote "tarball" but meant "tar gz file." Tom didn't really give an answer, just some info about uncompressed tarballs, which is not what OP is asking about. That is likely the reason for the downvotes. You cannot just answer a "how do I?" question with a "you don't need to" answer we have no idea what the OP needs unless he or she states it in the question. – Nate T Jan 15 '21 at 20:08
-
@papo Seems like Tom S knew this answer may end up in the red. CYA alt account? Single activity accounts are common for questions, but for an answer? – Nate T Jan 16 '21 at 00:44
-
This may be irrelevant for the question, but I got some info I was looking for. Thanks – Aditya Kane Jul 11 '21 at 16:53