10

Is there an existing method or will I need to manually parse and skip the exe block before passing the data to ZipInputStream?

James Allman
  • 40,573
  • 11
  • 57
  • 70

4 Answers4

17

After reviewing the EXE file format and the ZIP file format and testing various options it appears the easiest solution is to just ignore any preamble up to the first zip local file header.

Zip file layout

Zip local file header

I wrote an input stream filter to bypass the preamble and it works perfectly:

ZipInputStream zis = new ZipInputStream(
    new WinZipInputStream(
    new FileInputStream("test.exe")));
while ((ze = zis.getNextEntry()) != null) {
    . . .
    zis.closeEntry();
}
zis.close();

WinZipInputStream.java

import java.io.FilterInputStream;
import java.io.InputStream;
import java.io.IOException;

public class WinZipInputStream extends FilterInputStream {
    public static final byte[] ZIP_LOCAL = { 0x50, 0x4b, 0x03, 0x04 };
    protected int ip;
    protected int op;

    public WinZipInputStream(InputStream is) {
        super(is);
    }

    public int read() throws IOException {
        while(ip < ZIP_LOCAL.length) {
            int c = super.read();
            if (c == ZIP_LOCAL[ip]) {
                ip++;
            }
            else ip = 0;
        }

        if (op < ZIP_LOCAL.length)
            return ZIP_LOCAL[op++];
        else
            return super.read();
    }

    public int read(byte[] b, int off, int len) throws IOException {
        if (op == ZIP_LOCAL.length) return super.read(b, off, len);
        int l = 0;
        while (l < Math.min(len, ZIP_LOCAL.length)) {
            b[l++] = (byte)read();
        }
        return l;
    }
}
James Allman
  • 40,573
  • 11
  • 57
  • 70
7

The nice thing about ZIP files is their sequential structure: Every entry is a independent bunch of bytes, and at the end is a Central Directory Index that lists all entries and their offsets in the file.

The bad thing is, the java.util.zip.* classes ignore that index and just start reading into the file and expect the first entry to be a Local File Header block, which isn't the case for self-extracting ZIP archives (these start with the EXE part).

Some years ago, I wrote a custom ZIP parser to extract individual ZIP entries (LFH + data) that relied on the CDI to find where these entries where in the file. I just checked and it can actually list the entries of a self-extracing ZIP archive without further ado and give you the offsets -- so you could either:

  1. use that code to find the first LFH after the EXE part, and copy everything after that offset to a different File, then feed that new File to java.util.zip.ZipFile:

    Edit: Just skipping the EXE part doesn't seem to work, ZipFile still won't read it and my native ZIP program complains that the new ZIP file is damaged and exactly the number of bytes I skipped are given as "missing" (so it actually reads the CDI). I guess some headers would need to be rewritten, so the second approach given below looks more promising -- or

  2. use that code for the full ZIP extraction (it's similar to java.util.zip); this would require some additional plumbing because the code originally wasn't intended as replacement ZIP library but had a very specific use case (differential updating of ZIP files over HTTP)

The code is hosted at SourceForge (project page, website) and licensed under Apache License 2.0, so commercial use is fine -- AFAIK there's a commercial game using it as updater for their game assets.

The interesting parts to get the offsets from a ZIP file are in Indexer.parseZipFile which returns a LinkedHashMap<Resource, Long> (so the first map entry has the lowest offset in the file). Here's the code I used to list the entries of a self-extracting ZIP archive (created with the WinZIP SE creator with Wine on Ubuntu from an acra release file):

public static void main(String[] args) throws Exception {
    File archive = new File("/home/phil/downloads", "acra-4.2.3.exe");
    Map<Resource, Long> resources = parseZipFile(archive);
    for (Entry<Resource, Long> resource : resources.entrySet()) {
        System.out.println(resource.getKey() + ": " + resource.getValue());
    }
}

You can probably rip out most of the code except for the Indexer class and zip package that contains all the header parsing classes.

Philipp Reichart
  • 20,771
  • 6
  • 58
  • 65
  • 1
    Thanks for the information it put me on the right track. I ended up writing a simple input filter to ignore anything up to the first local header block. – James Allman Oct 31 '11 at 15:12
2

There are fake Local File Header markers in some self-extracting ZIP files. I think it's best to scan a file backwards to find End Of Central Directory record. EOCD record contains offset of a Central Directory, and CD contains offset of the first Local File Header. If you start reading from the first byte of a Local File Header ZipInputStream works fine.

Obviously the code below is not the fastest solution. If you are going to process large files you should implement some kind of buffering or use memory mapped files.

import org.apache.commons.io.EndianUtils;
...

public class ZipHandler {
    private static final byte[] EOCD_MARKER = { 0x06, 0x05, 0x4b, 0x50 };

    public InputStream openExecutableZipFile(Path zipFilePath) throws IOException {
        try (RandomAccessFile raf = new RandomAccessFile(zipFilePath.toFile(), "r")) {
            long position = raf.length() - 1;
            int markerIndex = 0;
            byte[] buffer = new byte[4];
            while (position > EOCD_MARKER.length) {
                raf.seek(position);
                raf.read(buffer, 0 ,1);
                if (buffer[0] == EOCD_MARKER[markerIndex]) {
                    markerIndex++;
                } else {
                    markerIndex = 0;
                }
                if (markerIndex == EOCD_MARKER.length) {
                    raf.skipBytes(15);
                    raf.read(buffer, 0, 4);
                    int centralDirectoryOffset = EndianUtils.readSwappedInteger(buffer, 0);
                    raf.seek(centralDirectoryOffset);
                    raf.skipBytes(42);
                    raf.read(buffer, 0, 4);
                    int localFileHeaderOffset = EndianUtils.readSwappedInteger(buffer, 0);
                    return new SkippingInputStream(Files.newInputStream(zipFilePath), localFileHeaderOffset);
                }
                position--;
            }
            throw new IOException("No EOCD marker found");
        }
    }
}

public class SkippingInputStream extends FilterInputStream {
    private int bytesToSkip;
    private int bytesAlreadySkipped;

    public SkippingInputStream(InputStream inputStream, int bytesToSkip) {
        super(inputStream);
        this.bytesToSkip = bytesToSkip;
        this.bytesAlreadySkipped = 0;
    }

    @Override
    public int read() throws IOException {
        while (bytesAlreadySkipped < bytesToSkip) {
            int c = super.read();
            if (c == -1) {
                return -1;
            }
            bytesAlreadySkipped++;
        }
        return super.read();
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        if (bytesAlreadySkipped == bytesToSkip) {
            return super.read(b, off, len);
        }
        int count = 0;
        while (count < len) {
            int c = read();
            if (c == -1) {
                break;
            }
            b[count++] = (byte) c;
        }
        return count;
    }
}
skuzniarz
  • 33
  • 4
-1

TrueZip works best in this case. (Atleast in my case)

The self extracting zip is of the following format code1 header1 file1 (while a normal zip is of the format header1 file1)...The code tells on how to extract the zip

Though the Truezip extracting utility complains about the extra bytes and throws an exception

Here is the code

 private void Extract(String src, String dst, String incPath) {
    TFile srcFile = new TFile(src, incPath);
    TFile dstFile = new TFile(dst);
    try {
        TFile.cp_rp(srcFile, dstFile, TArchiveDetector.NULL);
        } 
    catch (IOException e) {
       //Handle Exception
        }
}

You can call this method like Extract(new String("C:\2006Production.exe"), new String("c:\") , "");

The file is extracted in the c drive...you can perform your own operation on your file. I hope this helps.

Thanks.

jaysun
  • 159
  • 1
  • 1
  • 10