Extract zip entries to another Zip file

Question

Can anyone tell me why the following code doesnt work? I am using the SharpZipLib API for the Zip streams, latest version DL'ed today from their site. Im attempting to use this logic to merge the contents of one zip file into another, without having to perform IO on the disk, as the intended zip files may contain reserved file names for windows. I have tried this with multiple different source and destination zip files (those that contain reserved names and those that dont). The code does not throw any exception, and if you inspect the buffer prior to each write operation, you can see that it contains real data, yet after the entire operation finishes the size of the target zip file has not changed, and you can explore it to confirm that no new files (the ones the code is supposed to add) have actually been added to the destination file. :(

    public static void CopyToZip(string inArchive, string outArchive)
    {

        ZipOutputStream outStream = null;
        ZipInputStream inStream = null;
        try
        {
            outStream = new ZipOutputStream(File.OpenWrite(outArchive));
            outStream.IsStreamOwner = false;
            inStream = new ZipInputStream(File.OpenRead(inArchive));
            ZipEntry currentEntry = inStream.GetNextEntry();
            while (currentEntry != null)
            {

                byte[] buffer = new byte[1024];
                ZipEntry newEntry = new ZipEntry(currentEntry.Name);
                newEntry.Size = currentEntry.Size;
                newEntry.DateTime = currentEntry.DateTime;
                outStream.PutNextEntry(newEntry);
                int size = 0;
                while ((size = inStream.Read(buffer, 0, buffer.Length)) > 0)
                {
                    outStream.Write(buffer, 0, size);
                }
                outStream.CloseEntry();

                currentEntry = inStream.GetNextEntry();
            }
            outStream.IsStreamOwner = true;
        }
        catch (Exception e)
        {
            throw e;
        }
        finally
        {
            try { outStream.Close(); }
            catch (Exception ignore) { }
            try { inStream.Close(); }
            catch (Exception ignore) { }
        }      
    }

score 1 · Answer 1 · answered Aug 08 '12 at 05:06

I ended up doing this using a different API. DotNet zip from http://dotnetzip.codeplex.com/. Here is the implementation:

    public static void CopyToZip(string inArchive, string outArchive, string tempPath)
    {
        ZipFile inZip = null;
        ZipFile outZip = null;

        try
        {
            inZip = new ZipFile(inArchive);
            outZip = new ZipFile(outArchive);
            List<string> tempNames = new List<string>();
            List<string> originalNames = new List<string>();
            int I = 0;
            foreach (ZipEntry entry in inZip)
            {
                if (!entry.IsDirectory)
                {
                    string tempName = Path.Combine(tempPath, "tmp.tmp");
                    string oldName = entry.FileName;
                    byte[] buffer = new byte[4026];
                    Stream inStream = null;
                    FileStream stream = null;
                    try
                    {
                        inStream = entry.OpenReader();
                        stream = new FileStream(tempName, FileMode.Create, FileAccess.ReadWrite);
                        int size = 0;
                        while ((size = inStream.Read(buffer, 0, buffer.Length)) > 0)
                        {
                            stream.Write(buffer, 0, size);
                        }
                        inStream.Close();
                        stream.Flush();
                        stream.Close();
                        inStream = new FileStream(tempName, FileMode.Open, FileAccess.Read);

                        outZip.AddEntry(oldName, inStream);
                        outZip.Save();
                    }
                    catch (Exception exe)
                    {
                        throw exe;
                    }
                    finally
                    {
                        try { inStream.Close(); }
                        catch (Exception ignore) { }
                        try { stream.Close(); }
                        catch (Exception ignore) { }
                    }
                }
            }

        }
        catch (Exception e)
        {
            throw e;
        }
    }

It gets around the file name problem by using a temp file (tmp.tmp) in a given directory, then giving the file the original name in the archive — Mark W, Aug 08 '12 at 05:07
Your code will work fine, but .. I have a couple notes for you. 1. your code saves the output file once for each entry that gets added. this is unnecessary. 2. You can avoid the writing to the filesystem entirely. see my answer for details. — Cheeso, Oct 30 '12 at 17:58
I had to call the save there :(. During my tests, when i queued up the list of files to add and tried to add them all at once, with a single call to to the Save() method, for whatever reason, the file remained unchanged. I am well aware of the giagantic performance hit I took saving where I do, but unfortunately I couldnt make it work the appropriate way. — Mark W, Nov 05 '12 at 19:15

score 0 · Answer 2 · answered Aug 07 '12 at 20:09

0

One issue that I see is that you are opening the output zip file using File.OpenWrite(), which will replace the existing output file rather than merging new entries into it.

There is an example on the SharpDevelop Wiki that gives an example of updating a zip file using memory streams. It can be found at http://wiki.sharpdevelop.net/SharpZipLib_Updating.ashx#Updating_a_zip_file_in_memory_1

answered Aug 07 '12 at 20:09

Thierry

1,031
7
16

I've followed those examples to create the code that I posted here. At this point it doesnt matter if it overwrites or not, the above code doesnt change the state of the destination zip file at all. If it was overwriting, at least I would have something to debug... I dont need to use a memeory stream. What I mean by 'without using the disk' is that I cant extract the contents of one zip onto the hard drive, then repack it into another zip. It has to move from one zip to the next without being unpacked on the HD because of windows file naming conventions. – Mark W Aug 07 '12 at 21:38
I used new FileStream(outArchive, FileMode.Open) inplace of File.OpwnWrite(). It didnt change the result at all. – Mark W Aug 07 '12 at 21:45

score 0 · Answer 3 · answered Oct 30 '12 at 18:06

The following is some simpler code that will read from the input zip and write to the output zip, which potentially already exists. It does not require writing temporary data to the filesystem.

  public static void CopyToZip(string inArchive, string outArchive)
  {
      using (inZip = new ZipFile(inArchive),
             outZip = new ZipFile(outArchive))
      {
          Func<String,Func<String,Stream>> getInStreamReturner = (name) => {
              return new Func<String,Stream>(a){ return inZip[a].OpenReader(); };
          };
          foreach (ZipEntry entry in inZip)
          {
              if (!entry.IsDirectory)
              {
                  string zipEntryName = entry.FileName;
                  outZip.AddEntry(zipEntryName,
                                  getInStreamReturner(zipEntryName),
                                  (name, stream) => stream.Close() );
              }
          }
          outZip.Save();
      }
  }

Notes:

This approach uses the ZipFile.AddEntry overload that accepts two delegates: an opener and a closer. These functions get called at the time of ZipFile.Save. The former delegate needs to open and return the stream that contains the data to be zipped. The latter delegate needs to just close the stream.
It is necessary to define the getInStreamReturner Func , in order to open the right stream at the time of ZipFile.Save. Bear in mind that the zipEntryName changes value each time through the loop. Also ZipEntry.OpenReader() opens a stream on the actual zip data, which reads-and-decompresses as it goes. You can have only one of those open, at any one time, per ZipFile. getInStreamReturner creates a new function each time through the loop, thereby creating a closure to retain the value of the zipEntryName for reference at the time of ZipFile.Save.
This approach will fail if there are name clashes between the inArchive and outArchive. To avoid that you'd need to check for that and somehow avoid it. Either contrive a new, unique name, or skip adding entries with duplicate names into the outarchive.
I haven't tested this.

While this approach does not write to the filesystem, it does decompress and recompress file data. There is an open request to provide a feature to DotNetZip to migrate entries without that decompress/recompress jump. I haven't implemented that yet.

Extract zip entries to another Zip file

3 Answers3

Linked