How can I split a large tar.gz file into smaller tar.gz files of maximum 1 GB?

Question

I have a file archive.tar.gz, which is 38 GB in size. I want to store it in chunks of maximum 1 GB.

To do that, I would like to split it into sub-files archive_0.tar.gz, archive_1.tar.gz, ... so that each sub-file is individually readable (and not just the original file cut at the 1024-th byte).

In other words, each archive_x.tar.gz file should be a valid tar.gz file.

How can I do this? Preferably using shell scripting, or Python.

Do you have enough disk space for twice the data (uncompressed) on Linux? — Basile Starynkevitch, Aug 01 '23 at 13:32
Looks like this tool (googled for "split tar by size") might do the job: https://github.com/dmuth/tarsplit — AKX, Aug 02 '23 at 05:52
Consider also, if allowed, using [afio](https://github.com/kholtman/afio) for backup on Linux — Basile Starynkevitch, Aug 02 '23 at 10:24
A lead: *[Split files using tar, gz, zip, or bzip2](https://stackoverflow.com/questions/1120095/)* (2009. Closed as off-topic.) — Peter Mortensen, Aug 19 '23 at 18:55
Why must each sub-file be individually readable? Isn't some information ending up in the two different sub-files? — Peter Mortensen, Aug 19 '23 at 19:02
Re Windows: Cross-site duplicate: *[How can I split a large file on Windows?](https://superuser.com/questions/94083)* — Peter Mortensen, Aug 19 '23 at 19:06
Site [Unix & Linux](https://unix.stackexchange.com/tour) happily accepts such work orders. The solution will be provided in multiple scripting languages. — Peter Mortensen, Aug 31 '23 at 12:57

Basile Starynkevitch · Answer 1 · 2023-09-01T14:03:18.543

Assume you have enough disk space, run Linux, and could have root permission:

List the files in the original tar archive, both with size and without

tar tzvf archive.tar.gz > /tmp/archive-full-list
tar tzf archive.tar.gz > /tmp/archive-list

write your GNU AWK or Python script (or a C program using libtar) parsing these archive lists, checking that no single file has 1 GB (uncompressed) data. Maybe use some SQLite or Redis or PostgreSQL database to store the metadata, name (i.e., file path), permissions, owner and size of each file.
write another script chunking the data in single gigabyte uncompressed chunks, collecting file paths appropriately
run the appropriate tar commands, creating the archive_x.tar.gz

NB. The obtained "splitting" is probably not the optimal one (since some textual files of 2Gbytes might be compressed in less than one Gbyte; this is difficult to predict).

How can I split a large tar.gz file into smaller tar.gz files of maximum 1 GB?

1 Answers1