1

In order to update a progress bar with the number of files to extract. My program is going over a list of Zip files and collects the number of files in them. The combined number is approximately 22000 files.

The code I am using:

    foreach (string filepath in zipFiles)
    {
        ZipArchive zip = ZipFile.OpenRead(filepath);
        archives.Add(zip);
        filesCounter += zip.Entries.Count;
    }

However it looks like the zip.Entries.Count is doing some kind of a traversal and it takes ages for this count to complete (Several Minutes and much, much more, if the internet connection is not great).

To have a sort of notion how much this can improve, I compared the above to the performance of 7-Zip. I took one of the zip files that contain ~11000 files and folders:

  1. 2 Seconds to Open 7-Zip Archive.
  2. 1 Second to get the file properties
  3. In the properties I can see 10016 files + 882 folder - meaning it takes 7-Zip ~3 seconds to know there are 10898 entries in the Zip file.

7-Zip Properties

Any Idea, suggestion or any alternative method, that quickly counts the number of files, will be appreciated.

  • Using DotNetZip to count is actually much faster, but due to some internal bureaucratic issues, I can't use it. I need to have a solution not involving third party libraries, I can still use Microsoft Standard Libraries.
Juv
  • 744
  • 7
  • 12
  • For those assisting, I recommend having a squiz at https://stackoverflow.com/questions/61880276/counting-the-number-of-files-using-zipfilearchive-is-very-slow . – mjwills May 19 '20 at 11:29
  • @mjwills - thanks, this is the previous post regarding this issue, unfortunately stackoverflow does not let you reopen it once it was linked. – Juv May 19 '20 at 11:34
  • SO does allow questions to be reopened - if it wasn't a duplicate. This question (and your original question) is a straight up duplicate. You are asking for the impossible here. If you want faster code, you need to be prepared to run _different_ code. – mjwills May 19 '20 at 11:38

1 Answers1

0

My progress bar issue is solved, by taking a new approach to the matter.

I simply accumulate all ZIP files sizes, which serves as the max size. Now for each individual file that is extracted I add its compressed size to the progress. This way the progress bar does not show me the number of files, it shows me the uncompressed progress (E.g. If, in total, I have 4GB to Extract, when the progress bar is 1/4 green, I know I Extracted 1GB). Looks like a better representation of reality.

foreach (string filepath in zipFiles)
{
    ZipArchive zip = ZipFile.OpenRead(filepath);
    archives.Add(zip);

    // Accumulating the Zip files sizes.
    filesCounter += new FileInfo(filepath).Length; 
}

// To utilize multiple processors it is possible to activate this loop
// in a thread for each ZipArchive -> currentZip!
// :
// :

foreach (ZipArchiveEntry entry in currentZip.Entries) {
    // Doing my extract code here.
    // :
    // :

    // Accumulate the compressed size of each file.
    compressedFileSize += entry.CompressedLength

    // Doing other stuff
    // :
    // :
}

So the issue with improving the performance of the zip.Entries.Count is still on, and I am still interested in knowing how to solve this specific issue (What does 7Zip do to be so quick - may be they use the DotNetZip or other C++ libraries)

Juv
  • 744
  • 7
  • 12