0

I have a 2 GB archive (prefer .zip or .rar) file in parts (let's assume 100 parts x 20MB), and I am trying to find a way to unpack it properly. I started with a .zip archive; I had files like test.zip, test.z01, test.z02...test.99, etc. When I merge them in Python like this:

for zipName in zips:
     with open(os.path.join(path_to_zip_file, "test.zip"), "ab") as f:
         with open(os.path.join(path_to_zip_file, zipName), "rb") as z:
             f.write(z.read())

and then, after merge, unpack it like thod"

with zipfile.ZipFile(os.path.join(path_to_zip_file, "test.zip"), "r") as zipObj:
     zipObj.extractall(path_to_zip_file)

I get errors, likr

test.zip file isn't zip file.

So then I tried with a .rar archive. I tried to unpack just the first file to see if my code would intelligently look for and pick up the remaining archive fragments, but it did not. So again I merged the .rar files (just like in the .zip case), and then tried to unpack it by using patoolib:

patoolib.extract_archive("test.rar", outdir="path here")

When I do that, I get errors like:

patoolib.util.PatoolError: could not find an executable program to extract format rar; candidates are (rar,unrar,7z)

After some work I figured out that these merged files are corrupted (I copied it and try to unpack normally on windows using WinRAR, and encountered some problems). So I tried other ways to merge for example using cat cat test.part.* >test.rar, but those don't help.

How can I merge and then unpack these archive files properly in Python?

TylerH
  • 20,799
  • 66
  • 75
  • 101
KyluAce
  • 933
  • 1
  • 8
  • 25
  • You can't simply append two files and assume that it is a valid new file. The idea is that the unzip / unrar tool will handle multiple files on its own when giving it the first file of such a set. – Thomas Weller Jan 17 '22 at 10:11
  • @ThomasWeller I tried that. Just archive libs doesn't support that kind of action. Also that's why cat doesn't work probably part of this zip/rar parts has some headers that should be skipped etc. – KyluAce Jan 17 '22 at 11:14
  • @MisterMiyagi problem is that you don't understand that. Creating parts isn't important in that case (they were created manualy by winrar) Like I said there is problem with merging parts. – KyluAce Jan 17 '22 at 11:15
  • 1
    I totally agree with @MisterMiyagi. How did you end up with such files in the first place? They might not be ZIP files at all. I can rename a DOC file to Z01 and you will never be able to uncompress, no matter how hard you try. – Thomas Weller Jan 17 '22 at 11:16
  • As I said in my first comment: you probably don't have to merge at all. – Thomas Weller Jan 17 '22 at 11:21
  • @ThomasWeller I have to merge multipart files becasue libs like zipfile doesn't support extracting multiparts and cause errors like zip isn't zip file. – KyluAce Jan 17 '22 at 11:23

1 Answers1

4

Calling 7z out of python

  1. rename the .zip to .zip.001 and .z01 to zip.002 and so on.
  2. call 7z on the 001 ( 7z x test.zip.001 )
import subprocess
cmd = ['7z', 'x', 'test.zip.001']
sp = subprocess.Popen(cmd, stderr=subprocess.STDOUT, stdout=subprocess.PIPE)

CAT

cat test.zip* > test.zip should also work, but not always imho. Tried it for single file and works, but failed with subfolders. Maintaining the right order is mandatory.

Testing:

7z -v1m a test.zip 12MFile
cat test.zip* > test.zip
7z t test.zip
>> Everything is Ok

Can't check with "official" WinRAR (does this even still exist?!) nor WinZIP Files.

Merge File in Python

If you want to stay in python this works too (again for my 7z testfiles..):

import shutil
import glob

with open('output_file.zip','wb') as wfd:
    for f in glob.glob('test.zip.*'): # Search for all files matching searchstring
        with open(f,'rb') as fd:
            shutil.copyfileobj(fd, wfd) # Concatinate

Further remarks

  • pyunpack (python frontend) with patool (python backend) and installed unrar or p7zip-rar (7z with the unfree rar-stuff) for linux or 7z in windows can handle zip and rar (and many more) in python
  • there is a 7z x -t flag for explicitly set it as split archive (if file is not named 001 maybe helps). Give as e.g. 7z x -trar.split or 7z x -tzip.split or something.
araisch
  • 1,727
  • 4
  • 15
  • first of all there is no part z00. Secondly first of all I checked that I merge sorted file (and what order that was). And I had good order ( I mean first zip then z01 etc). And still archive was corrupted – KyluAce Jan 17 '22 at 11:18
  • Ok sorry, I tested it and it works. With rars there is even less work because extends are the same just need to choose first part. Bounty for you – KyluAce Jan 17 '22 at 12:41
  • btw there is no 7z lib that support rar extends in python ? – KyluAce Jan 17 '22 at 12:42