1

Thank you snakecharmerb. This was my solution. Thank you for helping an io amateur. Occasionally throws errors, which doesn't happen if I decompress then search, but now I'm off and running.

I've got a few TB of files sitting in .zip files with a compression ratio of about 100:1.

The files are text files, but they're all terminated with something else, like .lol If they were terminated .txt, the following code would work, allowing me to search through the files in the archive without saving them to disk, and saving the sub-file's name in a new file to see which particular files contain a particular keyword 'searchedstring':

But the files are not terminated .txt! Is there any way force ZipFile to interpret the file in the archive as a .txt? Only way I've found to do this it to decompress, rename the files, and recompress the archives.

import zipfile
with zipfile.ZipFile(filename, mode='r') as z:
    zz=zipfile.Path(z)
    import os
    pathy = os.path.dirname(os.path.abspath(filename))
    filey = os.path.splitext(pathy)
    textypath = pathy+'\ ' +filey[1]+searchedstring+'.txt'
    textey = open(textypath,'a')
    ella = z.namelist()
    for file in z.namelist():
        t=time.time()
        contents=z.open(file)
        # setting flag and index to 0
        flag = 0

# Loop through the file line by line
        for line in contents:   
           
# checking string is present in line or not
            if searchedstring in line:
    
                flag = 1
                break 
            if flag==1:
                textey.write(file)

        print(time.time()-t)
textey.close()
Matt Reed
  • 19
  • 3

0 Answers0