1

I'm working with the following block of code, in an attempt to extract data from a zip file

import zipfile

def get_zip(filenam,targetdir):
    with zipfile.ZipFile(filenam,"r") as zip_ref:
        zip_ref.extractall(targetdir)

zip_file = 'coolThing.zip'
targetdir = 'C:/puItHere/'
get_zip(zip_file,targetdir)

However, I get the error

"BadZipFile: Bad magic number for file header"

Looking through previous forums like this one, I find that my zip file needs to have the header "\x50\x4B\x03\x04" but it actually has the header "b'PK\x03\x04"

Does anyone know of a way where I can use zipfile, pyunpack, or any other library in order to extract what I need from this file type? I'm getting data from a large repository, and will be iterating through 30 TB of data, only taking what I need out of the zip files, and so far from what I've seen, they all use the same header

Thanks!

PLundquist
  • 13
  • 4
  • 1
    "\x50\x4B\x03\x04" is the exact same string as "PK\x03\x04". (I think it displays differently in the linked answer because they're using Python 2) – SuperStormer Sep 18 '22 at 19:54
  • Did you try the code under "Update" in the linked answer? – SuperStormer Sep 18 '22 at 19:57
  • Yessir, same result – PLundquist Sep 18 '22 at 19:58
  • Do non-python methods work (eg. unzip, 7z, etc)? – SuperStormer Sep 18 '22 at 20:01
  • Using WinRAR to extract what I need works fine, but I'd like to iterate through. I tried wrapping WinRAR in a system call but got this: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 3-4: truncated \UXXXXXXXX escape – PLundquist Sep 18 '22 at 20:05
  • You can always create a batch file to iterate among the files, using python to create that batch file. Did that in the past for other unsupported things, it is not the most elegant approach but it works – Zaero Divide Sep 18 '22 at 20:58
  • Your question lacks the info how the archive is created, so that you have the required [mcve] in your question. Please, as a new user here, also take the [tour] and read [ask]. – Ulrich Eckhardt Sep 18 '22 at 21:20

0 Answers0