6

I made a program in C# where it processes about 30 zipped folders which have about 35000 files in total. My purpose is to read every single file for processing its information. As of now, my code extracts all the folders and then read the files. The problem with this process is it takes about 15-20 minutes for it to happen, which is a lot.

I am using the following code to extract files:

void ExtractFile(string zipfile, string path)
{
    ZipFile zip = ZipFile.Read(zipfile);
    zip.ExtractAll(path);
}

The extraction part is the one which takes the most time to process. I need to reduce this time. Is there a way I can read the contents of the files inside the zipped folder without extracting them? or if anyone knows any other way that can help me reduce the time of this code ?

Thanks in advance

George Mauer
  • 117,483
  • 131
  • 382
  • 612

3 Answers3

2

Maybe instead of extracting it to the hard disk, you should try read it without extraction, using OpenRead, then you would have to use the ZipArchiveEntry.Open method.

Also have a look at the CodeFluent Runtime tool, which claims to be improved for performances issues.

cubitouch
  • 1,929
  • 15
  • 28
  • but with OpenRead can i read the contents of the files inside the zipped file ? suppose I have a zip file myzip.zip and its got my.txt inside it. Can I read whats the data inside my.txt without extracting the file ? – user2945623 Jan 27 '14 at 17:41
  • 1
    @user2945623 see my edit on ZipArchiveEntry.Open() method (you will extract it anyway, but not to a directory which can consume more cpu and be hard drive time access constrained). – cubitouch Jan 27 '14 at 17:43
2

You could try reading each entry into a memory stream instead of to the file system:

ZipFile zip = ZipFile.Read(zipfile);
foreach(ZipEntry entry in zip.Entries)
{
    using(MemoryStream ms = new MemoryStream())
    {
        entry.Extract(ms);
        ms.Seek(0,SeekOrigin.Begin);
        // read from the stream
    }

}
D Stanley
  • 149,601
  • 11
  • 178
  • 240
0

Try to break your responses into single await async methods, which started one by one if one of the responses is longer than 50 ms. http://msdn.microsoft.com/en-us/library/hh191443.aspx

If we have for example 10 executions which call one by one, in async/await we call our executions parallel, and operation will depend only from server powers.

BorHunter
  • 893
  • 3
  • 18
  • 44
  • This wouldn't speed up things at all. At best it might make things more responsive, but async/await cannot possibly decrease total time. – George Mauer Jan 27 '14 at 17:45
  • @BorHunter - Do you mean concurrent, like a parallel for? – StingyJack Jan 27 '14 at 18:42
  • @StingyJack yes, if we have for example 10 executions which call one by one, in async/await we call our executions parallel, and operation will depend only from server powers. – BorHunter Jan 27 '14 at 21:31
  • You may want to update your answer to include that. It is possible to use async/await in a scenario where the tasks are not concurrent, and I think that is what George Mauer is pointing out. – StingyJack Jan 28 '14 at 21:48