Need to concatenate some files from github which have been split into several pieces due to the size (as from this dataset https://github.com/kang-gnak/eva-dataset)
Using request
these end up in my temporary data storage in the format File_Name.zip.001
to File_Name.zip.007
The completed file is not text but images so I haven't found a straightforward way to rebuild File_Name.zip
from code.
Is anyone aware of a solution that would work directly in Colab?
I am looking for both repeatability and the ability to share my code as a Colab notebook, so I am trying to avoid solutions that involve having to download and rebuild the file locally and reuploading it each time. I would also prefer not to have to make an online copy of existing data if there's a way to rebuild and unzip the file directly from the code.
Thanks in advance.
I attempted using a list of the parts' file names assigned to
data_zip_parts
and run the following code:
with zipfile.ZipFile(data_path / "File_Name.zip", 'a') as full_zip:
for file_name in data_zip_parts:
part = zipfile.ZipFile(data_path / file_name, 'r')
for name in part.namelist():
full_zip.writestr(name, zipfile.open(name).read())
However looks like this file format cannot be read directly so I get the following error:
BadZipFile: File is not a zip file
Just a reminder that I want to try to do this directly within Google Colab: I have asked a few peers but most of them gave me solutions to run on my local system such as command line or using 7zip, which isn't quite what I'm looking for, but I expect there may be a way to work around this format, and would appreciate the assistance.