0

I have been using Spring's multipart uploader controllers to upload and store entries from zipped files, but I am finding that the occaisional PNG file being corrupted, where instead of begginning with something like "PNG..." in its byte[], it starts with "fþ»ÀÃgÞÉ" or similar. This seems to happen to the same files on each run. I tried all of this using java.util.ZipEntry and then I tried Apache Compress and found that Apache compress corrupted different files to the Java 7 utility, but always the same files on subsequent runs.

The code (firstly java.util.zip.ZipEntry):

protected void processZipFile(String path, MultipartFile file, String signature) throws IOException {

    DateFormat df = new SimpleDateFormat("yyyyMMddhhmmss");
    File tempFile = new File(System.getProperty("user.dir") + "/" + file.getName() + df.format(new Date()));
    file.transferTo(tempFile);

    ZipFile zipFile = null;
    try {
        zipFile = new ZipFile(tempFile);

        LOG.debug("Processing archive with name={}, size={}.", file.getName(), file.getSize());

        final Enumeration<? extends ZipEntry> entries = zipFile.entries();
        while ( entries.hasMoreElements() )
        {
            ZipEntry entry = entries.nextElement();

            LOG.debug("Processing file={} is directory?={}.", entry.getName(), entry.isDirectory());

            // we don't bother processing directories, and we don't process any resource fork info
            // from Mac OS X (which does not seem to be transparent to ZipFile).

            if (!(entry.isDirectory() || entry.getName().contains("__MACOSX")  || entry.getName().contains(".DS_Store"))) {
                // if the entry is a file, extract it


                Content contentToSave = null;

                if(entry.getName().contains("gif") || entry.getName().contains("png") || entry.getName().contains("jpeg")) {

                    byte[] bytes = readInputStream( zipFile.getInputStream( entry ), entry.getSize() );

                    LOG.debug("{} is of inflated-length={} from compressed-length={}",
                            entry.getName(), bytes.length, entry.getCompressedSize());

                    if(entry.getName().contains("gif")) {
                        contentToSave = Content.makeImage(path + entry.getName(), Content.GIF, signature, bytes);

                    } else if (entry.getName().contains("png")) {
                        contentToSave = Content.makeImage(path + entry.getName(), Content.PNG, signature, bytes);

                    } else if (entry.getName().contains("jpeg")) {
                        contentToSave = Content.makeImage(path + entry.getName(), Content.JPEG, signature, bytes);

                    }
                } else {

                    InputStream is = zipFile.getInputStream(entry);

                    if (entry.getName().contains("json")) {
                        contentToSave = Content.makeFile(path + entry.getName(), Content.JSON, signature, convertStreamToString(is));
                    } else if (entry.getName().contains("js")) {
                        contentToSave = Content.makeFile(path + entry.getName(), Content.JS, signature, convertStreamToString(is));
                    } else if (entry.getName().contains("css")) {
                        contentToSave = Content.makeFile(path + entry.getName(), Content.CSS, signature, convertStreamToString(is));
                    } else if (entry.getName().contains("xml")) {
                        contentToSave = Content.makeFile(path + entry.getName(), Content.XML, signature, convertStreamToString(is));
                    } else if (entry.getName().contains("html")) {
                        contentToSave = Content.makeFile(path + entry.getName(), Content.HTML, signature, convertStreamToString(is));
                    }
                }

                contentService.putOrReplace(contentToSave);
                LOG.info("Persisted file: {} from uploaded version.", contentToSave.getName());
            }

        }
    } catch (ZipException e) {
        // If I can't create a ZipFile, then this is not a zip file at all and it cannot be processed
        // by this method. Its pretty dumb that there's no way to determine whether the contents are zipped through
        // the ZipFile API, but that's just one of its many problems.
        e.printStackTrace();
        LOG.error("{} is not a zipped file, or it is empty", file.getName());
    } finally {
        zipFile = null;
    }
    tempFile.delete();

}

And now the same thing for org.apache.commons.compress.archivers.zip.ZipFile:

protected void processZipFile(String path, MultipartFile file, String signature) throws IOException {

    DateFormat df = new SimpleDateFormat("yyyyMMddhhmmss");
    File tempFile = new File(System.getProperty("user.dir") + "/" + file.getName() + df.format(new Date()));
    file.transferTo(tempFile);

    ZipFile zipFile = null;
    try {
        zipFile = new ZipFile(tempFile);

        LOG.debug("Processing archive with name={}, size={}.", file.getName(), file.getSize());

        final Enumeration<? extends ZipArchiveEntry> entries = zipFile.getEntries();
        while ( entries.hasMoreElements() ) {
            ZipArchiveEntry entry = entries.nextElement();

            LOG.debug("Processing file={} is directory?={}.", entry.getName(), entry.isDirectory());

                // we don't bother processing directories, and we don't process any resource fork info
                // from Mac OS X (which does not seem to be transparent to ZipFile).

                if (!(entry.isDirectory() || entry.getName().contains("__MACOSX")  || entry.getName().contains(".DS_Store"))) {
                    // if the entry is a file, extract it

                    Content contentToSave = null;

                    if(entry.getName().contains("gif") || entry.getName().contains("png") || entry.getName().contains("jpeg")) {

                        byte[] bytes = readInputStream( zipFile.getInputStream( entry ), entry.getSize() );

                        LOG.debug("{} is of inflated-length={} from compressed-length={}",
                            entry.getName(), bytes.length, entry.getCompressedSize());

                        if(entry.getName().contains("gif")) {
                            contentToSave = Content.makeImage(path + entry.getName(), Content.GIF, signature, bytes);

                        } else if (entry.getName().contains("png")) {
                            contentToSave = Content.makeImage(path + entry.getName(), Content.PNG, signature, bytes);

                        } else if (entry.getName().contains("jpeg")) {
                            contentToSave = Content.makeImage(path + entry.getName(), Content.JPEG, signature, bytes);

                        }
                    } else {

                        InputStream is = zipFile.getInputStream(entry);

                        if (entry.getName().contains("json")) {
                            contentToSave = Content.makeFile(path + entry.getName(), Content.JSON, signature, convertStreamToString(is));
                        } else if (entry.getName().contains("js")) {
                            contentToSave = Content.makeFile(path + entry.getName(), Content.JS, signature, convertStreamToString(is));
                        } else if (entry.getName().contains("css")) {
                            contentToSave = Content.makeFile(path + entry.getName(), Content.CSS, signature, convertStreamToString(is));
                        } else if (entry.getName().contains("xml")) {
                            contentToSave = Content.makeFile(path + entry.getName(), Content.XML, signature, convertStreamToString(is));
                        } else if (entry.getName().contains("html")) {
                            contentToSave = Content.makeFile(path + entry.getName(), Content.HTML, signature, convertStreamToString(is));
                        }
                    }

                    contentService.putOrReplace(contentToSave);
                    LOG.info("Persisted file: {} from uploaded version.", contentToSave.getName());
                }

        }
    } catch (ZipException e) {
        e.printStackTrace();
        LOG.error("{} is not a zipped file, or it is empty", file.getName());
    } catch (IOException e) {
        e.printStackTrace();
        LOG.error("{} is not a file, or it is empty", file.getName());
    } finally {
        zipFile = null;
    }
    tempFile.delete();
}

The two called methods are:

private static byte[] readInputStream( final InputStream is, final long length ) throws IOException {
    final byte[] buf = new byte[ (int) length ];
    int read = 0;
    int cntRead;
    while ( ( cntRead = is.read( buf, 0, buf.length ) ) >=0  )
    {
        read += cntRead;
    }
    return buf;
}

and:

public String convertStreamToString(InputStream is) throws IOException {
    StringBuilder sb = new StringBuilder(2048);
    char[] read = new char[128];
    try (InputStreamReader ir = new InputStreamReader(is, StandardCharsets.UTF_8)) {
        for (int i; -1 != (i = ir.read(read)); sb.append(read, 0, i));
    }

    // need to remove the ? at teh beginning of some files. This comes from the UTF8 BOM
    // that is added to some files saved as UTF8
    String out = sb.toString();
    String utf8Bom = new String(new char[]{'\ufeff'});
    if(out.contains(utf8Bom)) {
        out = out.replace(utf8Bom,"");
    }
    return out;
}

The second one is, of course, not likely part of the problem.

I have googled around and it looks like issues similar to this have been found, but its always been some outside issue. Does anyone know why this might be the case?

I have re-edited some images and found that if I change the image to black and white, or change the hue of the whole image, the problem goes away, but if I add a border or change a single colour the problem remains. It looks like a particular arrangement of bytes in some files tickles a bug in whatever underlying API that both Java's own and Apache's compressed file readers use, but that's just speculation.

EDIT: additional usage shows that the corruption happens in gifs over 10K in size, so perhaps this has something to do with the bug? I have tried arbitrarily doubling the size of the buffer in the call to ReadInputStream(), but it did nothing except overflow the blob size in MySQL in particularly large images (49K became 98K, which was too big).

com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for column 'encoded_content' at row 1
Michael Coxon
  • 3,337
  • 8
  • 46
  • 68

1 Answers1

0

My finding is that this issue arise when the 'packed size' is larger that the actual size, this can happen with png files for example which are already 'zipped' them selves.

Frank
  • 1