1

Im trying to "pack" several files (previously inside a jar archive) in another single non-jar file by using DataInputStream / DataOutputStream.

The idea was:

    First int = number of entries
    
    First UTF is the first entry name
    
    Second Int is entry byte array length (entry size)

    Then repeat for every entry.

The code:

 public static void main(String[] args) throws Throwable {
        test();

        System.out.println("========================================================================================");

        final DataInputStream dataInputStream = new DataInputStream(new FileInputStream(new File("C:\\Users\\Admin\\Desktop\\randomJarOut")));

        for (int int1 = dataInputStream.readInt(), i = 0; i < int1; ++i) {
            final String utf = dataInputStream.readUTF();
            System.out.println("Entry name: " + utf);
            final byte[] array = new byte[dataInputStream.readInt()];
            for (int j = 0; j < array.length; ++j) {
                array[j] = dataInputStream.readByte();
            }
            System.out.println("Entry bytes length: " + array.length);
        }

    }

Unpacking original & packing to new one:

private static void test() throws Throwable {
    JarInputStream stream = new JarInputStream(new FileInputStream(new File("C:\\Users\\Admin\\Desktop\\randomJar.jar")));
    JarInputStream stream1 = new JarInputStream(new FileInputStream(new File("C:\\Users\\Admin\\Desktop\\randomJar.jar")));

    final byte[] buffer = new byte[2048];
    final DataOutputStream outputStream = new DataOutputStream(new FileOutputStream(new File("C:\\Users\\Admin\\Desktop\\randomJarOut")));

    int entryCount = 0;
    for (ZipEntry entry; (entry = stream.getNextJarEntry()) != null; ) {
        entryCount++;
    }

    outputStream.writeInt(entryCount);

    for (JarEntry entry; (entry = stream1.getNextJarEntry()) != null; ) {
        int entryRealSize = stream1.read(buffer);
        if (!(entryRealSize == -1)) {
            System.out.println("Writing: " + entry.getName() + " Length: " + entryRealSize);

            outputStream.writeUTF(entry.getName());
            outputStream.writeInt(entryRealSize);

            for (int len = stream1.read(buffer); len != -1; len = stream1.read(buffer)) {
                outputStream.write(buffer, 0, len);
            }
        }
    }
    outputStream.flush();
    outputStream.close();
}

Apparently im able to unpack the first entry without any problems, the second one and others:

Entry name: META-INF/services/org.jd.gui.spi.ContainerFactory
Entry bytes length: 434
Exception in thread "main" java.io.UTFDataFormatException: malformed input around byte 279
    at java.io.DataInputStream.readUTF(DataInputStream.java:656)
    at java.io.DataInputStream.readUTF(DataInputStream.java:564)
    at it.princekin.esercizio.Bootstrap.main(Bootstrap.java:29)
Disconnected from the target VM, address: '127.0.0.1:54384', transport: 'socket'

Process finished with exit code 1

Does anyone knows how to fix this? Why is this working for the first entry but not the others?

  • I don't know how you're managing to read at all from a *jar* file with a `FileInputStream`/`DataInputStream`. Even if it were not compressed I don't see how that would work, *unless* it's not a jar at all… – g00se Jul 30 '21 at 14:55
  • the original file is a jar, thats why im using JarInputStream to unpack the jar and getting every entry bytes & name. – Donald C. Spencer Jul 30 '21 at 14:57
  • I'm referring to *`final DataInputStream dataInputStream = new DataInputStream(new FileInputStream(new File("C:\\Users\\Admin\\Desktop\\randomJarOut.jar")));`* – g00se Jul 30 '21 at 14:58
  • thats not anymore a jar file, thats the packed result. i just gave the .jar to understand its the output of the original jar – Donald C. Spencer Jul 30 '21 at 14:59
  • Well it certainly confused me ;). Shouldn't *`int entryRealSize = stream1.read(buffer);`* be `int entryRealSize = entry.getSize();`? – g00se Jul 30 '21 at 15:10
  • unfortunatly entry.getSize() always returns -1, thats why i had to use `read();` – Donald C. Spencer Jul 30 '21 at 15:15

2 Answers2

1

The problem, probably, lies in that you are mixing not reciprocal read/write methods:

  1. The writer method writes with outputStream.writeInt(entryCount) and the main method reads with dataInputStream.readInt(). That is OK.
  2. The writer method writes with outputStream.writeUTF(entry.getName()) and the main method reads with dataInputStream.readUTF(). That is OK.
  3. The writer method writes with outputStream.writeInt(entryRealSize) and the main method reads with dataInputStream.readInt(). That is OK.
  4. The writer method writes with outputStream.write(buffer, 0, len) and the main method reads with dataInputStream.readByte() several times. WRONG.

If you write an array of bytes with write(buffer, offset, len), you must read it with read(buffer, offset, len), because write(buffer, offset, len) writes exactly len physical bytes onto the output stream, while writeByte (the counterpart of readByte) writes a lot of metadata overhead about the object type, and then its state variables.

Bugs in the writer method

There is also a mayor bug in the writer method: It invokes up to three times stream1.read(buffer), but it just uses once the buffer contents. The result is that the real size of file is actually written onto the output stream metadata, but it is followed by just a small part of the data.

If you need to know the input file size before writing it in the output stream, you have two choices:

  • Either chose a large enough buffer size (like 204800) which will allow you to read the whole file in just one read and write it in just one write.
  • Or either separate read from write algorithms: First a method to read the whole file and store it in memory (a byte[], for example), and then another method to write the byte[] onto the output stream.

Full fixed solution

I've fixed your program, with specific, decoupled methods for each task. The process consists in parsing the input file to a memory model, write it to an intermediate file according to your custom definition, and then read it back.

public static void main(String[] args)
    throws Throwable
{
    File inputJarFile=new File(args[0]);
    File intermediateFile=new File(args[1]);
    List<FileData> fileDataEntries=parse(inputJarFile);
    write(fileDataEntries, intermediateFile);
    read(intermediateFile);
}

public static List<FileData> parse(File inputJarFile)
    throws IOException
{
    List<FileData> list=new ArrayList<>();
    try (JarInputStream stream=new JarInputStream(new FileInputStream(inputJarFile)))
    {
        for (ZipEntry entry; (entry=stream.getNextJarEntry()) != null;)
        {
            byte[] data=readAllBytes(stream);
            if (data.length > 0)
            {
                list.add(new FileData(entry.getName(), data));
            }
            stream.closeEntry();
        }
    }
    return list;
}

public static void write(List<FileData> fileDataEntries, File output)
    throws Throwable
{
    try (DataOutputStream outputStream=new DataOutputStream(new FileOutputStream(output)))
    {
        int entryCount=fileDataEntries.size();

        outputStream.writeInt(entryCount);

        for (FileData fileData : fileDataEntries)
        {
            int entryRealSize=fileData.getData().length;
            {
                System.out.println("Writing: " + fileData.getName() + " Length: " + entryRealSize);

                outputStream.writeUTF(fileData.getName());
                outputStream.writeInt(entryRealSize);
                outputStream.write(fileData.getData());
            }
        }
        outputStream.flush();
    }
}

public static void read(File intermediateFile)
    throws IOException
{
    try (DataInputStream dataInputStream=new DataInputStream(new FileInputStream(intermediateFile)))
    {
        for (int entryCount=dataInputStream.readInt(), i=0; i < entryCount; i++)
        {
            String utf=dataInputStream.readUTF();
            int entrySize=dataInputStream.readInt();
            System.out.println("Entry name: " + utf + " size: " + entrySize);
            byte[] data=readFixedLengthBuffer(dataInputStream, entrySize);
            System.out.println("Entry bytes length: " + data.length);
        }
    }
}

private static byte[] readAllBytes(InputStream input)
    throws IOException
{
    byte[] buffer=new byte[4096];
    byte[] total=new byte[0];
    int len;
    do
    {
        len=input.read(buffer);
        if (len > 0)
        {
            byte[] total0=total;
            total=new byte[total0.length + len];
            System.arraycopy(total0, 0, total, 0, total0.length);
            System.arraycopy(buffer, 0, total, total0.length, len);
        }
    }
    while (len >= 0);
    return total;
}

private static byte[] readFixedLengthBuffer(InputStream input, int size)
    throws IOException
{
    byte[] buffer=new byte[size];
    int pos=0;
    int len;
    do
    {
        len=input.read(buffer, pos, size - pos);
        if (len > 0)
        {
            pos+=len;
        }
    }
    while (pos < size);
    return buffer;
}

private static class FileData
{
    private final String name;

    private final byte[] data;

    public FileData(String name, byte[] data)
    {
        super();
        this.name=name;
        this.data=data;
    }

    public String getName()
    {
        return this.name;
    }

    public byte[] getData()
    {
        return this.data;
    }
}
Little Santi
  • 8,563
  • 2
  • 18
  • 46
1

My take on this is that the jar file (which in fact is a zip file) has a Central Directory which is only read with the ZipFile (or JarFile) class. The Central Directory contains some data about the entries such as the size.

I think the ZipInputStream will not read the Central Directory and thus the ZipEntry will not contain the size (returning -1 as it is unknown) whereas reading ZipEntry from ZipFile class will.

So if you first read the size of each entry using a ZipFile and store that in a map, you can easily get it when reading the data with the ZipInputStream.

This page includes some good examples as well.

So my version of your code would be:

import java.io.*;
import java.util.HashMap;
import java.util.Map;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.util.zip.ZipInputStream;

public class JarRepacker {

    public static void main(String[] args) throws Throwable {
        JarRepacker repacker = new JarRepacker();
        repacker.repackJarToMyFileFormat("commons-cli-1.3.1.jar", "randomJarOut.bin");
        repacker.readMyFileFormat("randomJarOut.bin");
    }
    
    private void repackJarToMyFileFormat(String inputJar, String outputFile) throws Throwable {
        int entryCount;
        Map<String, Integer> sizeMap = new HashMap<>();
        try (ZipFile zipFile = new ZipFile(inputJar)) {
            entryCount = zipFile.size();
            zipFile.entries().asIterator().forEachRemaining(e -> sizeMap.put(e.getName(), (int) e.getSize()));
        }

        try (final DataOutputStream outputStream = new DataOutputStream(new FileOutputStream(outputFile))) {

            outputStream.writeInt(entryCount);

            try (ZipInputStream stream = new ZipInputStream(new BufferedInputStream(new FileInputStream(inputJar)))) {
                ZipEntry entry;
                final byte[] buffer = new byte[2048];
                while ((entry = stream.getNextEntry()) != null) {
                    final String name = entry.getName();
                    outputStream.writeUTF(name);
                    final Integer size = sizeMap.get(name);
                    outputStream.writeInt(size);
                    //System.out.println("Writing: " + name + " Size: " + size);

                    int len;
                    while ((len = stream.read(buffer)) > 0) {
                        outputStream.write(buffer, 0, len);
                    }
                }
            }
            outputStream.flush();
        }
    }

    private void readMyFileFormat(String fileToRead) throws IOException {
        try (DataInputStream dataInputStream
                     = new DataInputStream(new BufferedInputStream(new FileInputStream(fileToRead)))) {

            int entries = dataInputStream.readInt();
            System.out.println("Entries in file: " + entries);
            for (int i = 1; i <= entries; i++) {
                final String name = dataInputStream.readUTF();
                final int size = dataInputStream.readInt();
                System.out.printf("[%3d] Reading: %s of size: %d%n", i, name, size);
                final byte[] array = new byte[size];
                for (int j = 0; j < array.length; ++j) {
                    array[j] = dataInputStream.readByte();
                }
                // Still need to do something with this array...
            }
        }
    }

}

Anders Lindgren
  • 338
  • 3
  • 9