6

I would like to get the percentage a file is at while zipping it. For instance it will print 1%, 2%, 3%, etc. I have no idea on where to start. How would I go about doing this right now I just have the code to zip the file.

Code:

zipPath = zipfile.ZipFile("Files/Zip/" + pic + ".zip", "w")

for root, dirs, files in os.walk(filePath):
    for file in files:
        zipPath.write(os.path.join(root, file), str(pic) + "\\" + file)

print("Done")
zipPath.close()
  • 2
    You might want to use progress library: https://pypi.python.org/pypi/progress/ – Mikko Ohtamaa Feb 15 '15 at 03:23
  • Actually, looks like you have quite few libraries you can choose from: https://pypi.python.org/pypi?:action=search&term=progress&submit=search – Mikko Ohtamaa Feb 15 '15 at 03:24
  • Since you have no idea how big the file will be when you're done, it seems hard to provide any useful output about progress so far (@MikkoOhtamaa's comment is a great idea on how to display progress if you **know** how far along you are, but, in this case, how **can** you know?!) – Alex Martelli Feb 15 '15 at 03:26
  • It is not like this information is hidden anywhere. Just walk through source files twice and on the first iteration count the sum of the sizes. – Mikko Ohtamaa Feb 15 '15 at 03:28
  • 1
    @Mikko: The filesystem can change at any time... While that is a workable solution, make sure you don't crash or misbehave (beyond incorrect progress display) if files appear or disappear at the wrong time. – Kevin Feb 15 '15 at 03:37
  • If the `filePath` tree is large and almost static and you run the script regularly then you could save the statistics from the previous run and use it to estimate progress for the current one. A simple way is to [use `tqdm` to report progress in the terminal](https://github.com/noamraph/tqdm) – jfs Feb 15 '15 at 04:24

2 Answers2

3

Unfortunately, you can't get progress on the compression of each individual file from the zipfile module, but you can get an idea of the total progress by keeping track of how many bytes you've processed so far.

As Mikko Ohtamaa suggested, the easiest way to do this is to walk through the file list twice, first to determine the file sizes, and second to do the compression. However, as Kevin mentioned the contents of the directory could change between these two passes, so the numbers may be inaccurate.

The program below (written for Python 2.6) illustrates the process.

#!/usr/bin/env python

''' zip all the files in dirname into archive zipname

    Use only the last path component in dirname as the 
    archive directory name for all files

    Written by PM 2Ring 2015.02.15

    From http://stackoverflow.com/q/28522669/4014959
'''

import sys
import os
import zipfile


def zipdir(zipname, dirname):
    #Get total data size in bytes so we can report on progress
    total = 0
    for root, dirs, files in os.walk(dirname):
        for fname in files:
            path = os.path.join(root, fname)
            total += os.path.getsize(path)

    #Get the archive directory name
    basename = os.path.basename(dirname)

    z = zipfile.ZipFile(zipname, 'w', zipfile.ZIP_DEFLATED)

    #Current data byte count
    current = 0
    for root, dirs, files in os.walk(dirname):
        for fname in files:
            path = os.path.join(root, fname)
            arcname = os.path.join(basename, fname)
            percent = 100 * current / total
            print '%3d%% %s' % (percent, path)

            z.write(path, arcname)
            current += os.path.getsize(path)
    z.close()


def main():
    if len(sys.argv) < 3:
        print 'Usage: %s zipname dirname' % sys.argv[0]
        exit(1)

    zipname = sys.argv[1]
    dirname = sys.argv[2]
    zipdir(zipname, dirname)


if __name__ == '__main__':
    main()

Note that I open the zip file with the zipfile.ZIP_DEFLATED compression argument; the default is zipfile.ZIP_STORED, i.e., no compression is performed. Also, zip files can cope with both DOS-style and Unix-style path separators, so you don't need to use backslashes in your archive pathnames, and as my code shows you can just use os.path.join() to construct the archive pathname.


BTW, in your code you have str(pic) inside your inner for loop. In general, it's a bit wasteful re-evaluating a function with a constant argument inside a loop. But in this case, it's totally superfluous, since from your first statement it appears that pic is already a string.

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • This is somewhat similar to a solution that I made by having the uncompressed size and then doing a calculation to get the total percent. –  Feb 15 '15 at 05:40
-1

The existing answer works only on a file level, i.e. if you have a single huge file to zip you would not see any progress until the whole operation is finished. In my case I just had one huge file, and I did something like this:

import os
import types
import zipfile
from functools import partial

if __name__ == '__main__':
    out_file = "out.bz2"
    in_file = "/path/to/file/to/zip"

    def progress(total_size, original_write, self, buf):
        progress.bytes += len(buf)
        progress.obytes += 1024 * 8  # Hardcoded in zipfile.write
        print("{} bytes written".format(progress.bytes))
        print("{} original bytes handled".format(progress.obytes))
        print("{} % done".format(int(100 * progress.obytes / total_size)))
        return original_write(buf)
    progress.bytes = 0
    progress.obytes = 0

    with zipfile.ZipFile(out_file, 'w', compression=zipfile.ZIP_DEFLATED) as _zip:
        # Replace original write() with a wrapper to track progress
        _zip.fp.write = types.MethodType(partial(progress, os.path.getsize(in_file),
                                                 _zip.fp.write), _zip.fp)
        _zip.write(in_file)

Not optimal since there is a hardcoded number of bytes handled per call to write() which could change.

Also the function is called quite frequently, updating a UI should probably not be done for every call.

Zitrax
  • 19,036
  • 20
  • 88
  • 110
  • "AttributeError: 'file' object attribute 'write' is read-only" I got the above error. It didn't like me altering a system object method call. – Sophie McCarrell Oct 13 '17 at 20:06
  • @JasonMcCarrell The code still works for me exactly as written above, just copied and pasted and run using Python 3.6 with only `in_file` changed to an existing file. I guess you must do something differently? – Zitrax Oct 14 '17 at 09:29
  • Ah, I was running it with Python 2.7. No idea why we use such an archiac version at work. – Sophie McCarrell Oct 14 '17 at 21:37