-3

I'm making archive of a folder that has around ~1 GB of files in it. It takes like 1 or 2 minutes but I want it to be faster.

I am making a UI app in Python that allows you to ZIP files (it's a project, I know stuff like 7Zip exists lol), and I am using a folder with ~1GB of files in it, like I said. The program won't accept files over 5GB because of the speed.

Here is how I compress/archive the folder: shutil.make_archive('files_archive', 'zip', 'folder_to_archive')

I know I can change the format but I'm not sure which one is fast and can compress well.

So, my end goal is to compress files fast, whilst also reducing size.

WhatTheClown
  • 464
  • 1
  • 7
  • 24
  • tar is likely the fastest, but also provides effectively no compression. otherwise zip is probably the runner up – Alexander Jan 01 '23 at 11:29
  • 1
    "well I do, the folder has to be under 5GB." "I'm making archive of a folder that has around ~1 GB of files in it." I can't understand. Doesn't the folder already meet the requirement? Therefore, why do we have to do anything? Please read [ask] and https://stackoverflow.com/help/on-topic, and try to identify clear requirements. "Here is how I compress/archive the folder: shutil.make_archive('files_archive', 'zip', 'folder_to_archive')" Okay, so **what is wrong** with this approach? – Karl Knechtel Jan 01 '23 at 11:50
  • Hi, I wanted to update you on something: The thing I was making was for other people to use. *They* might try to zip folders/files over 5GB which is why my program doesn't accept that because it would be too slow. So, I needed a way to compress files fast so I can support up to 10GB instead or more. Not everything is going to be ran on my machine. I might send the script to friends too. Please, find common sense. – WhatTheClown Apr 07 '23 at 11:13

1 Answers1

1

shutil.make_archive offers zip, tar, gztar, bztar or xztar as archive formats.

tar is uncompressed so will be fastest, but uncompressed.

If you want to have faster compression than the built in formats, you will need an external command.

A common fast compression algorithm is LZ4.

I recommend you look into https://morotti.github.io/lzbench-web/ for various compression algorithms, and their compression ratio over speed.

But you will likely have to benchmark on a sample of own data.

ljmc
  • 4,830
  • 2
  • 7
  • 26
  • How do I put LZ4 into make_archive? When I tried it, it said `unknown archive format 'lz4'`. I have read your answer, and I do know that LZ4 is an algorithm, not a format but I thought I'd try it. – WhatTheClown Jan 01 '23 at 12:06
  • If you want to stick with `shutil.make_archive`, you’ll need to create an archiving function similar to [`_make_tarball`](https://github.com/python/cpython/blob/e83f88a455261ed53530a960f1514ab7af7d2e82/Lib/shutil.py#L884) or [`_make_zipfile`](https://github.com/python/cpython/blob/e83f88a455261ed53530a960f1514ab7af7d2e82/Lib/shutil.py#L954) for LZ4 and register it with [`shutil.register_archive_format`](https://docs.python.org/3/library/shutil.html#shutil.register_archive_format), otherwise, make the tar archive and run `lz4` externally with subprocess. – ljmc Jan 01 '23 at 12:34