9

I have a 130GB tar file I need to untar to /storage.

Problem is, I don't have enough disk space for both to exist at the same time. How can I untar the tar file, and ensure there is enough disk space, and delete the file immediately after untar?

Thanks.

Andrew Fashion
  • 1,655
  • 7
  • 22
  • 26
  • 3
    *Seriously?!* If you had just [use rysnc as Chopper3 suggested](http://serverfault.com/questions/208300/quickest-way-to-transfer-55gb-of-images-to-new-server/208305#208305) (and outvoted your accepted answer by 10x) you wouldn't have had *any* of the problems you're encountering, including this one – Mark Henderson Dec 05 '10 at 02:08
  • 1
    If you checked Andrew's recent questions, you might discover that [he had a problem getting rsync to work](http://serverfault.com/questions/208494/rsync-is-just-hanging-building-file-list). So cut him some slack. – Steven Monday Dec 05 '10 at 04:15
  • guys, he already has a 130GB tar archive, he needs a solution for this time, next time he will use rsync or scp. My response was for this problem, not for the one on how he can copy 130gb files over the network. – Paul Dec 05 '10 at 20:31
  • You must have meant delete **while** `untar`ing, not _after_, since there is blatantly no way to do it _after_ without consuming double the space for both the pre- and post-extraction copies. – underscore_d Oct 27 '15 at 19:13

5 Answers5

3

This one is hard. It really depends on how much free space you have.

Others said to extract part of the files and delete them from the tar. That's the only option I can see right now.

Updating the tar requires the tar to be reconstructed without the deleted files on the same drive. That's why you have to have 2 x tar size + something more to allow extraction of the files.

HTH

Paul
  • 1,857
  • 1
  • 11
  • 15
1
while read p; do
  echo "$p"
  tar -xf archive.tar $p
  tar --delete -f archive.tar $p
done < tar -tf archive.tar
mforsetti
  • 2,666
  • 2
  • 16
  • 20
user756678
  • 11
  • 1
  • I had to save the output of `tar -tf archive.tar` into a separate file for this to work. It appears that `tar --delete` takes a huge amount of time (seems like it's rewriting the entire archive), so the time to execute this can be very significant, and it corrupts the archive if interrupted. – Ishamael Aug 04 '21 at 14:05
1

If it just a tar file (i.e. not zipped) and if you have enough space left on the disk to store a zipped version of the tarball, you could try to zip this and then construct an unzip | tar pipe. I would test this beforehand, though.

Easier solution: Get a bigger disk.

Sven
  • 98,649
  • 14
  • 180
  • 226
1

Can you put the tar file on a different server? Because then, you can untar via ssh without worrying about the space taken by the original file.

mattdm
  • 6,600
  • 1
  • 26
  • 48
  • 2
    Oh, or you can follow the solution you've accepted in http://serverfault.com/questions/208300 and no space is used on the remote server! – mattdm Dec 05 '10 at 05:42
1

In a word: Pain and suffering...

Understanding the file format used by tar, you could build some tools to help with what you are doing, but there are complications that may or may not be relevant to your particular file. The tar file uses headers of 512 bytes that specify the file-name and length of the file, among other things. You could use this information to build up a series of offsets within the tar file of each file entry. You could then do something like traverse the tar file backwards, and truncate the file as you extract the files.

However, there are some issues of sequence that you have to deal with. GNU tar, for example, can create some "fake" entries for file with long file names, to store additional information that can't fit in the 512 byte header. Also you may need to be careful about directory entries, which might specify permissions that would not allow you to extract files into the directory if you extract the directory entry before the contents.

The Python programming language, among others, includes a nice library for handling tar files.

However, another option would be to just split the large tar file up into many smaller files irrespective of the tar format. Split out the end and and then truncate the source file. Repeat until instead of a single 130GB file you have 130 1GB files. Obviously, getting these split/truncates right may be a little tricky. Could be done using the "dd" and "truncate" commands

Then it would be an easy matter to make a script that would "cat" the first file, delete the first file, cat the second file, delete the second file, etc... Pipe that script to "tar x" and as the tar file extracts the source files would be deleted.

Of course, these are all destructive operations, so you basically get one shot to do them right.

The easiest would be if you have a place to copy the 130GB file to and extract it from there. Say an external USB hard drive, or another machine and extract it over an SSH tunnel.

Sean Reifschneider
  • 10,720
  • 3
  • 25
  • 28