Nowadays (4 years later), the command would be:
git -C /home/user/example.com/ archive --format tgz -19 -o /home/user/site_backups/develop-`date +%Y-%m-%dT%H%M`.tar develop
With Git 2.30 (Q1 2021), "git archive
"(man) now allows compression level higher than "-9" when generating tar.gz output.
See commit cde8ea9 (09 Nov 2020) by René Scharfe (rscharfe
).
(Merged by Junio C Hamano -- gitster
-- in commit ede4d63, 18 Nov 2020)
archive
: support compression levels beyond 9
Signed-off-by: René Scharfe
Compression programs like zip
, gzip
, bzip2
and xz
allow to adjust the trade-off between CPU cost and size gain with numerical options from -1
for fast compression and -9
for high compression ratio.
zip also accepts -0
for storing files verbatim.
git archive
(man) directly support these single-digit compression levels for ZIP output and passes them to filters like gzip.
Zstandard additionally supports compression level options -10
to -19
, or up to -22
with --ultra
.
This seems to work with git archive
(man) in most cases, e.g. it will produce an archive with -19
without complaining, but since it only supports single-digit compression level options this is the same as -1 -9
and thus -9
.
Allow git archive
(man) to accept multi-digit compression levels to support the full range supported by zstd
.
Explicitly reject them for the ZIP format, as otherwise deflateInit2()
would just fail with a somewhat cryptic "stream consistency error
".
Note that, with Git 2.38 (Q3 2022), "git archive
"(man) now (optionally and then by default) avoids spawning an external "gzip
" process when creating ".tar.gz
" (and ".tgz
") archives.
See commit 4f4be00, commit 23fcf8b, commit 76d7602, commit dfce118, commit 96b9e51, commit 650134a (15 Jun 2022) by René Scharfe (rscharfe
).
(Merged by Junio C Hamano -- gitster
-- in commit b5a2d6c, 11 Jul 2022)
archive-tar
: add internal gzip implementation
Original-patch-by: Rohit Ashiwal
Signed-off-by: René Scharfe
Git uses zlib
for its own object store, but calls gzip
when creating tgz
archives.
Add an option to perform the gzip compression for the latter using zlib
, without depending on the external gzip
binary.
Plug it in by making write_block
a function pointer and switching to a compressing variant if the filter command has the magic value "git archive gzip
"(man)".
Does that indirection slow down tar creation? Not really, at least not in this test:
$ hyperfine -w3 -L rev HEAD,origin/main -p 'git checkout {rev} && make' \
'./git -C ../linux archive --format=tar HEAD # {rev}'
Benchmark #1: ./git -C ../linux archive --format=tar HEAD # HEAD
Time (mean ± σ): 4.044 s ± 0.007 s [User: 3.901 s, System: 0.137 s]
Range (min … max): 4.038 s … 4.059 s 10 runs
Benchmark #2: ./git -C ../linux archive --format=tar HEAD # origin/main
Time (mean ± σ): 4.047 s ± 0.009 s [User: 3.903 s, System: 0.138 s]
Range (min … max): 4.038 s … 4.066 s 10 runs
How does tgz creation perform?
$ hyperfine -w3 -L command 'gzip -cn','git archive gzip'
'./git -c tar.tgz.command="{command}" -C ../linux archive --format=tgz HEAD'
Benchmark #1: ./git -c tar.tgz.command="gzip -cn" -C ../linux archive --format=tgz HEAD
Time (mean ± σ): 20.404 s ± 0.006 s [User: 23.943 s, System: 0.401 s]
Range (min … max): 20.395 s … 20.414 s 10 runs
Benchmark #2: ./git -c
tar.tgz.command="git archive gzip -C ../linux archive --format=tgz HEAD
Time (mean ± σ): 23.807 s ± 0.023 s [User: 23.655 s, System: 0.145 s]
Range (min … max): 23.782 s … 23.857 s 10 runs
Summary
'./git -c tar.tgz.command="gzip -cn" -C ../linux archive --format=tgz HEAD' ran
1.17 ± 0.00 times faster than './git -c tar.tgz.command="git archive gzip" -C ../linux archive --format=tgz HEAD'
So the internal implementation takes 17% longer on the Linux repo, but uses 2% less CPU time.
That's because the external gzip can run in parallel on its own processor, while the internal one works sequentially and avoids the inter-process communication overhead.
What are the benefits?
Only an internal sequential implementation can offer this eco mode, and it allows avoiding the gzip(1)
requirement.
This implementation uses the helper functions from our zlib.c
instead of the convenient gz*
functions from zlib
, because the latter doesn't give the control over the generated gzip header that the next patch requires.
And:
archive-tar
: use internal gzip by default
Signed-off-by: René Scharfe
Drop the dependency on gzip(1) and use our internal implementation to create tar.gz
and tgz
files.
git archive
now includes in its man page:
magic command git archive gzip
by default, which invokes an internal implementation of gzip.
So a git archive
using an external gzip would be:
git -c tar.tgz.command="gzip -cn" archive --format=tgz HEAD >external_gzip.tgz
While the new default one would use the internal zlib:
git archive --format=tgz HEAD >j.tgz
In both cases, the compression level options mentioned above still apply.