DotNetZip creating zip from subset of other zip

Question

I have a big zipfile that I need to split in multiple zip files. In the method I'm now creating I have a List object.

This is the code I have got:

 //All files have the same basefilename/
 string basefilename = Path.GetFileNameWithoutExtension(entries[0].FileName);
 MemoryStream memstream = new MemoryStream();
 ZipFile zip = new ZipFile();
 foreach (var entry in entries)
 {
    string newFileName = basefilename + Path.GetExtension(entry.FileName);
    zip.AddEntry(newFileName, entry.OpenReader());
 }

 zip.Save(memstream);

 //this will later go in an file-io handler class.
 FileStream outstream = File.OpenWrite(@"c:\files\"+basefilename+ ".zip");
 memstream.WriteTo(outstream);
 outstream.Flush();
 outstream.Close();

And this is the error I get at the save() call :

{Ionic.Zlib.ZlibException: Bad state (invalid block type) at Ionic.Zlib.InflateManager.Inflate(FlushType flush) at Ionic.Zlib.ZlibCodec.Inflate(FlushType flush) at Ionic.Zlib.ZlibBaseStream.Read(Byte[] buffer, Int32 offset, Int32 count) at Ionic.Zlib.DeflateStream.Read(Byte[] buffer, Int32 offset, Int32 count) at Ionic.Crc.CrcCalculatorStream.Read(Byte[] buffer, Int32 offset, Int32 count) at Ionic.Zip.SharedUtilities.ReadWithRetry(Stream s, Byte[] buffer, Int32 offset, Int32 count, String FileName) at Ionic.Zip.ZipEntry._WriteEntryData(Stream s) at Ionic.Zip.ZipEntry.Write(Stream s) at Ionic.Zip.ZipFile.Save() at Ionic.Zip.ZipFile.Save(Stream outputStream) at

What am I doing wrong?

Which line is causing the error? – sq33G Oct 31 '11 at 20:24 — sq33G, Oct 31 '11 at 20:24

Cheeso · Accepted Answer · 2011-11-01T17:43:09.087

here's what you're doing wrong: You have multiple pending calls to ZipEntry.OpenReader() in a single ZipFile instance. You can have at most, only one pending ZipEntry.OpenReader().

Here's why: There is only one Stream object created when you instantiate a given zip file with ZipFile.Read() or new ZipFile(), passing the name of an existing file. When you call ZipEntry.OpenReader() , it results in a Seek() in the Stream object, to move the file pointer to the beginning of the compressed bytestream for that particular entry. When you call ZipEntry.OpenReader() again, it results in another Seek() to a different location in the stream. So by adding entries and calling OpenReader() in succession, you are calling Seek() repeatedly, but only the last one will be valid. The stream cursor will be placed at the start of the data for the entry corresponding to the last call to ZipEntry.OpenReader().

To fix it: Scrap your approach. The simplest way to create a new zipfile with fewer entries than an existing zip file is this: instantiate a ZipFile by reading the existing file, then remove the entries you don't want, then call ZipFile.Save() to a new path.

using (var zip = ZipFile.Read("c:\\dir\\path\\to\\existing\\zipfile.zip")) 
{
    foreach (var name in namesToRemove) // IEnumerable<String>
    {
       zip[name].Remove();
    }
    zip.Save("c:\\path\\to\\new\\Archive.zip");
}

EDIT
What this does at the time you call Save(): the library reads the raw, compressed data for the entries you have NOT removed from the filesystem file, and writes them into a new archive file. This is really fast because it does not decompress and recompress each entry in order to put it into the new, smaller zip file. Basically it reads slices of binary data out of the original zip file, and concatenates them together to form the new, smaller zip file.

To produce multiple smaller files, you can do this repeatedly with the original zip file; just wrap the above in a loop and vary the files you remove, and the filename of the new, smaller archive. Reading an existing zipfile is also pretty fast.

As an alternative, you could decompress and extract each entry, and then recompress and write the entry into a new zip file. That is the long way around, but it is possible. In that case, for each smaller zipfile you want to create, you will need to create two ZipFile instances. Open the first one by reading the original zip archive. For each entry you want to keep, create a MemoryStream, extract content from an entry into that MemoryStream, and remember to call Seek() in the mem stream to reset the cursor on the memory stream. Then using the second ZipFile instance, call AddEntry(), using that MemoryStream as the source for the added entry. Call ZipFile.Save() only on the second instance.

using (var orig = ZipFile.Read("C:\\whatever\\OriginalArchive.zip"))
{
    using (var smaller = new ZipFile())
    {
      foreach (var name in entriesToKeep) 
      { 
         var ms = new MemoryStream();
         orig[name].Extract(ms); // extract into stream
         ms.Seek(0,SeekOrigin.Begin);
         smaller.AddEntry(name,ms);
      }
      smaller.Save("C:\\location\\of\\SmallerZip.zip");
    }   
}

This works, but it involves decompression and recompression of each entry that goes into the smaller zip, which is inefficient and unnecessary.

If you don't mind the inefficiency of the decompression and recompression, there's an alternative you can employ: call the ZipFile.AddEntry() overload that accepts opener and closer delegates. What this does is defer the call to OpenReader() til the time the entry is being written to the new, smaller zip file. The effect is that you have just one pending OpenReader() at a time.

using(ZipFile original = ZipFile.Read("C:\\path.to\\original\\Archive.zip"),
      smaller = new ZipFile())
{
    foreach (var name in entriesToKeep)
    {
        zip.AddEntry(zipEntryName,
                     (name) => original[name].OpenReader(),
                     null);
    }

    smaller.Save("C:\\path.to\\smaller\\Archive.zip");
}

It's still inefficient, because each entry gets decompressed and recompressed, but it's a little less inefficient.

Probably the problem is indeed the OpenReader(). The thing is, I need to split one zipfile in maybe 20 smallerzipfiles. A customer drops one zipfile with for example 60 files via a website on a location. When we process this zip file there are .cfg,.txt and .htm files in the zip. Each CFG file results in a database record. And each database record needs a zip file attached with a cfg, txt and htm file in it. — Patrick, Nov 01 '11 at 07:37
Sounds like you need to force the new zip to read each old entry immediately, rather than building it a list and telling it to read/write them all in one go. Would including save() in your loop accomplish that? — sq33G, Nov 01 '11 at 08:15
@Patrick, I don't know why your "thing" makes the proposed solution unworkable for you. Start with a zip file, then, for each type of smaller zip file you want, go through the code I showed above: open the original zip file, remove the entries you don't want in the smaller zip, then save to "SmallerZipNNNN.zip" where NNN is the number of the smaller zip. — Cheeso, Nov 01 '11 at 12:01

score 1 · Answer 2 · answered Nov 01 '11 at 08:51

Cheeso pointed me out that I can't have multiple readers open. Though his solution of removing was not what I needed. So I tried with the new knowledge to solve the problem and this is what I created.

string basefilename = Path.GetFileNameWithoutExtension(entries[0].FileName);
ZipFile zip = new ZipFile();
foreach (var entry in entries){
      CrcCalculatorStream reader = entry.OpenReader();
      MemoryStream memstream = new MemoryStream();
      reader.CopyTo(memstream);
      byte[] bytes = memstream.ToArray();
      string newFileName = basefilename + Path.GetExtension(entry.FileName);
      zip.AddEntry(newFileName, bytes);
}

zip.Save(@"c:\files\" + basefilename + ".zip");

Patrick - I'm glad you found a solution that works. What you're doing here extracts and decompresses entries, then compresses the entries into a new zip. As you said, it works. The approach I proposed will also work, except it doesn't decompress and recompress anything. It simply writes a new zip file, skipping the entries that you *don't want.* — Cheeso, Nov 01 '11 at 11:56

Akron · Answer 3 · 2011-10-31T21:26:31.033

EDIT 2: I think you need the double backslash when specifying the pathname. I updated my code to reflect this. The double backslash codes for a regular backslash in a string.

EDIT: Does the variable "newFileName" represent the path that the file is currently located at? If this variable is something else, then that could be your problem. Without seeing more surrounding code, Im not sure.

I use the same libraries to make .zips all the time in my code, but I have never done it exactly like you are trying to. I don't know why your code is giving you an exception, but maybe this will work instead? (Assuming your strings/pathnames are all correct and the zip-library really is what was causing the issue)

using (ZipFile zip = new ZipFile())
{
   zip.CompressionLevel = CompressionLevel.BestCompression;
   foreach (var entry in entries)
   {
      try
      {
         string newFileName = basefilename + Path.GetExtension(entry.FileName);
         zip.AddFile(newFileName, "");
      }
      catch (Exception) { }
   }
   zip.Save("c:\\files\\"+basefilename+ ".zip");
}

The @ before a string makes double \\ not needed. I think it is the problem Cheeso described. I cant have multiple OpenReaders in it. — Patrick, Nov 01 '11 at 07:40

DotNetZip creating zip from subset of other zip

3 Answers3

Linked