I have a kind of big zip archive (~200 GB) and in this zip archive are several other archives. The thing is, I have to move stuff around inside of the given archive. This all works perfectly fine as long as I just have one archive of depth, but as soon as I have to manipulate an archive inside an archive it does not work as expected.
Simple example of structure:
Archive.zip
├── Folder1
│ ├── **Archive1_1.zip**
│ │ └── Folder1_1_1
│ │ │ └── stuff I have to work with...
│ ├── Archive1_2.zip
│ │ └── Folder1_2_1
│ │ │ └── stuff I have to work with...
│ └── Archive1_3.zip
│ │ └── another Folder1_3_1
│ │ │ └── stuff I have to work with...
├── Folder2
│ ├── Archive2_1.zip
│ │ └── Folder2_1_1
│ │ │ └── stuff I have to work with...
│ └── Folder2_2
│ │ └── stuff I have to work with...
└── Folder3
└── Folder3_1
└── stuff I have to work with...
As seen above, sometimes I just have to work inside the 'root'-Archive which works perfectly fine if I have to copy and move files around. But as soon as I have to do the exact same procedure inside e.g. Archive1_1 it does not work. I can read the data inside, but as soon as I have to write, it wont work.
-> It kind of works, there is no exception thrown, but after I wrote a file, the file does not exist.
For example:
I want to write file_C to "./foo/bar/file_C.txt" inside of Archive1_1.zip and there are already two other files (file_A and file_B). Before I write, zipfile gives me the information that there are 4 "files" in zipfile.filelist, two of them directories. After I write, there are 5 entries, but when I look this up with 7z, file_C does not exist.
Some code to work with:
How do I open my zip archives?
with zipfile.ZipFile(zipPath, mode='a') as root_archive:
for file_name in root_archive.namelist():
if re.search(r'\.zip$', file_name) is not None:
zip_archive = BytesIO(root_archive.read(file_name))
with zipfile.ZipFile(zip_archive, mode='a') as sub_archive:
start(sub_archive)
start(root_archive)
First I open my root-archive and then I look up if I got any zip-archives inside. If so, I open them and call the function start(archive)
. Much stuff is happening there and when its all done, I want to write.
How do I write?
config.zip_archive.write(f"./tmp/{ident}{constants.SUFFIX}",
f"{path}/{ident}{constants.SUFFIX}",
compress_type=ZIP_DEFLATED)
So now I edited the file locally and want to append it to my archive.
The variable path
would be "foo/bar"
since I opened the Archive1_1, ident = "file_C"
and constants.SUFFIX = ".txt"
. (example above)
How do I know, write was successful?
I added following code to the start and end of my algorithm:
print(f"Files before/after: {len(archive.filelist)})
In my example from before it would say:
Archive1_1:
Files before: 4
.
.
.
Files after: 5
When I start the script again, it also detects the file I added before, but I cannot see it with 7z when I look after it manually.
What did I do wrong there?
How can it be, that the file is detected by python zipfile but not with 7z?
Thanks for the help in advance!