1

I have a kind of big zip archive (~200 GB) and in this zip archive are several other archives. The thing is, I have to move stuff around inside of the given archive. This all works perfectly fine as long as I just have one archive of depth, but as soon as I have to manipulate an archive inside an archive it does not work as expected.

Simple example of structure:

Archive.zip
├── Folder1
│   ├── **Archive1_1.zip**
│   │   └── Folder1_1_1
│   │   │   └── stuff I have to work with...
│   ├── Archive1_2.zip
│   │   └── Folder1_2_1
│   │   │   └── stuff I have to work with...
│   └── Archive1_3.zip
│   │   └── another Folder1_3_1
│   │   │   └── stuff I have to work with...
├── Folder2
│   ├── Archive2_1.zip
│   │   └── Folder2_1_1
│   │   │   └── stuff I have to work with...
│   └── Folder2_2
│   │   └── stuff I have to work with...
└── Folder3
    └── Folder3_1
        └── stuff I have to work with...

As seen above, sometimes I just have to work inside the 'root'-Archive which works perfectly fine if I have to copy and move files around. But as soon as I have to do the exact same procedure inside e.g. Archive1_1 it does not work. I can read the data inside, but as soon as I have to write, it wont work.

-> It kind of works, there is no exception thrown, but after I wrote a file, the file does not exist.

For example:

I want to write file_C to "./foo/bar/file_C.txt" inside of Archive1_1.zip and there are already two other files (file_A and file_B). Before I write, zipfile gives me the information that there are 4 "files" in zipfile.filelist, two of them directories. After I write, there are 5 entries, but when I look this up with 7z, file_C does not exist.


Some code to work with:

How do I open my zip archives?

with zipfile.ZipFile(zipPath, mode='a') as root_archive:
    for file_name in root_archive.namelist():
        if re.search(r'\.zip$', file_name) is not None:
            zip_archive = BytesIO(root_archive.read(file_name))
            with zipfile.ZipFile(zip_archive, mode='a') as sub_archive:
                start(sub_archive)

    start(root_archive)

First I open my root-archive and then I look up if I got any zip-archives inside. If so, I open them and call the function start(archive). Much stuff is happening there and when its all done, I want to write.


How do I write?

config.zip_archive.write(f"./tmp/{ident}{constants.SUFFIX}", 
                         f"{path}/{ident}{constants.SUFFIX}", 
                         compress_type=ZIP_DEFLATED)

So now I edited the file locally and want to append it to my archive. The variable path would be "foo/bar" since I opened the Archive1_1, ident = "file_C" and constants.SUFFIX = ".txt". (example above)


How do I know, write was successful?

I added following code to the start and end of my algorithm:

print(f"Files before/after: {len(archive.filelist)})

In my example from before it would say:

Archive1_1:
Files before: 4
.
.
.
Files after: 5

When I start the script again, it also detects the file I added before, but I cannot see it with 7z when I look after it manually.


What did I do wrong there?

How can it be, that the file is detected by python zipfile but not with 7z?

Thanks for the help in advance!

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
CSharper96
  • 25
  • 3

0 Answers0