150

I need to compress a large file of about 17-20 GB. I need to split it into several files of around 1 GB per file.

I searched for a solution via Google and found ways using the split and cat commands. But they did not work for large files at all. Also, they won't work in Windows; I need to extract it on a Windows machine.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Aka
  • 2,121
  • 4
  • 17
  • 11
  • 3
    I feel your pain but this doesn't seem to be programming related. – Jason S Jul 13 '09 at 15:18
  • 1
    Many compression programs (e.g. like 7-Zip) is able to split the compressed file into volumes of a specified size for easier distribution. – Martin Liversage Jul 13 '09 at 15:22
  • This belongs on superuser.com, but the private beta doesn't start until tomorrow, I'm told. – JesperE Jul 13 '09 at 15:27
  • May I ask why you need that file compressed? – Jan Jungnickel Jul 13 '09 at 15:59
  • If one of the two viable solutions posted here doesn't pan out, he'll be needing a programming solution. – Joshua Jul 13 '09 at 17:15
  • The approved answer to this question, shows how you can do this using Python and the subprocess module: http://stackoverflow.com/questions/4368818/any-way-to-execute-a-piped-command-in-python-using-subprocess-module-without-usi (Python is a scripting language available for Windows, so there is a chance this might work...) – Samuel Lampa Dec 08 '10 at 09:09
  • Can we move this question to superuser forum if it's not applicable to programming? Because it's still applicable. – Nikhil VJ May 04 '20 at 02:49

4 Answers4

264

You can use the split command with the -b option:

split -b 1024m file.tar.gz

It can be reassembled on a Windows machine using Joshua's answer.

copy /b file1 + file2 + file3 + file4 filetogether

As @Charlie stated in the comment below, you might want to set a prefix explicitly because it will use x otherwise, which can be confusing.

split -b 1024m "file.tar.gz" "file.tar.gz.part-"

// Creates files: file.tar.gz.part-aa, file.tar.gz.part-ab, file.tar.gz.part-ac, ...

The most effective solution is very close to the content of this answer:

# Create archives
tar cz my_large_file_1 my_large_file_2 | split -b 1024MiB - myfiles_split.tgz_

# Uncompress
cat myfiles_split.tgz_* | tar xz

This solution avoids the need to use an intermediate large file when (de)compressing. Use the tar -C option to use a different directory for the resulting files. btw if the archive consists from only a single file, tar could be avoided and only gzip used:

# Create archives
gzip -c my_large_file | split -b 1024MiB - myfile_split.gz_

# Uncompress
cat myfile_split.gz_* | gunzip -c > my_large_file

For Windows you can download ported versions of the same commands or use Cygwin.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
matpie
  • 17,033
  • 9
  • 61
  • 82
  • 7
    if you don't add a prefix as the last argument after the filename to split you get output in files named xaa, xab, xac, xad.... – Charlie Jul 13 '09 at 19:01
  • 3
    Actually using `-b 1024MiB` gave an error that it was an invalid number of bytes. Using `--bytes=1024m` works. – Engineer2021 Mar 13 '14 at 12:52
  • And you don't have to use `cat` to reassemble the file. You can use `copy /b file1 + file2 + etc..` on Windows, then copy back to Linux and tar can read the reassembled tarball. I just tried it. – Engineer2021 Mar 13 '14 at 15:58
  • 1
    Split has the option `--numeric-suffixes`: use numeric suffixes instead of alphabetic. – Dr. Jan-Philip Gehrcke Feb 04 '15 at 12:47
  • If you prefer the prefix consisting of the original to avoid prefix name guessing using bash variables, use: `file=myfile.tar.gz` followed by `split -b 1024m $file ${file}-part-`. Re-assembly using `cat ${file}-part-* > $file` – Sebastian Müller May 26 '18 at 10:03
  • It's good idea to use `--verbose` option when splitting large files. – HadiRj Jul 09 '19 at 11:50
29

If you are splitting from Linux, you can still reassemble in Windows.

copy /b file1 + file2 + file3 + file4 filetogether
Joshua
  • 40,822
  • 8
  • 72
  • 132
10

Use tar to split into multiple archives.

There are plenty of programs that will work with tar files on Windows, including Cygwin.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Tim Hoolihan
  • 12,316
  • 3
  • 41
  • 54
9

Tested code, initially creates a single archive file, then splits it:

 gzip -c file.orig > file.gz
 CHUNKSIZE=1073741824
 PARTCNT=$[$(stat -c%s file.gz) / $CHUNKSIZE]

 # the remainder is taken care of, for example for
 # 1 GiB + 1 bytes PARTCNT is 1 and seq 0 $PARTCNT covers
 # all of file
 for n in `seq 0 $PARTCNT`
 do
       dd if=file.gz of=part.$n bs=$CHUNKSIZE skip=$n count=1
 done

This variant omits creating a single archive file and goes straight to creating parts:

gzip -c file.orig |
    ( CHUNKSIZE=1073741824;
        i=0;
        while true; do
            i=$[i+1];
            head -c "$CHUNKSIZE" > "part.$i";
            [ "$CHUNKSIZE" -eq $(stat -c%s "part.$i") ] || break;
        done; )

In this variant, if the archive's file size is divisible by $CHUNKSIZE, then the last partial file will have file size 0 bytes.

Adrian Panasiuk
  • 7,249
  • 5
  • 33
  • 54
  • 1
    That's what `split` does already. – ephemient Jul 13 '09 at 15:28
  • 1
    ephemient Hey, i've dig some post looking just FOR THAT. I have no split nor zip commands/binaries on a certain device, and this one has worked perfectly. I'll prepare this code to work as split command :). Thank you much @Adrian Panasiuk. That's pefect for me. – m3nda Mar 04 '15 at 16:42
  • But, i've tested and the result was a full file, not split. How can it be? Was a big file on a small device, so was a long process. Please test your solutions while posting :( – m3nda Mar 05 '15 at 09:44
  • @erm3nda You never told us you need to avoid creating a temporary file! Please see second variant! – Adrian Panasiuk Mar 05 '15 at 23:39
  • I was wrong about the 1º script, works perfect, was my mistake about the CHUNKSIZE variable. There's NO reason to create a gzip intermediate file. You can run all in just one command with a pipe. Both examples works, one does with dd and the other uses head data wrapper. My main problem was the missing split/zip binaries and those both solutions are what i ask for when go inside that. Thx again. – m3nda Mar 06 '15 at 01:00
  • I solved my problem yesterday using 7zip with `-v200m` option. But later i go back and test yours too and see my mistake :D – m3nda Mar 06 '15 at 01:08
  • thanks, the split command was resulting in a full file with no split for me too, while your script worked. Note: if on a MacOS then substitute "stat -c%s" with "stat -f%z" – iammyr May 21 '18 at 16:02
  • Thank you! I just used something half way between your first and second example to make something that would take a gzip compressed file, stream it through the subshell, and make it into gzipped part files - I couldn't find anything else that would work the same way.. I precalculated the number of parts (like the first example) but used the subshell to work on a single gunzippped stream. – Cinderhaze Oct 28 '19 at 20:33