0

I have a file archive.tar.gz, which is 38 GB in size. I want to store it in chunks of maximum 1 GB.

To do that, I would like to split it into sub-files archive_0.tar.gz, archive_1.tar.gz, ... so that each sub-file is individually readable (and not just the original file cut at the 1024-th byte).

In other words, each archive_x.tar.gz file should be a valid tar.gz file.

How can I do this? Preferably using shell scripting, or Python.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
user5123481566
  • 53
  • 1
  • 10

1 Answers1

-2

Assume you have enough disk space, run Linux, and could have root permission:

  1. List the files in the original tar archive, both with size and without

    tar tzvf archive.tar.gz > /tmp/archive-full-list
    tar tzf archive.tar.gz > /tmp/archive-list
    
  2. write your GNU AWK or Python script (or a C program using libtar) parsing these archive lists, checking that no single file has 1 GB (uncompressed) data. Maybe use some SQLite or Redis or PostgreSQL database to store the metadata, name (i.e., file path), permissions, owner and size of each file.

  3. write another script chunking the data in single gigabyte uncompressed chunks, collecting file paths appropriately

  4. run the appropriate tar commands, creating the archive_x.tar.gz

NB. The obtained "splitting" is probably not the optimal one (since some textual files of 2Gbytes might be compressed in less than one Gbyte; this is difficult to predict).

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547