0

I have a PHP script that must unzip some uploads. The uploads are packed folders, basically zip files with a custom extension.

I am having problems with some zip files packed in one machine, but not with the same folder packed in another machine. In both cases, the compression is done with the same Java library.

This is the expected result, which then PHP further proceses: correct extraction

This is the corrupted result, which makes PHP choke: corrupted extraction

If I look at their permissions, this is what I see (01_Orig is okay, 02_Modif is corrupted): permissions

If I look at the two packages with unzip -l (the first one is okay, the second one is corrupt): unzip -l

And this is my PHP function (which is the same in both cases):

$uploads = "uploads_dir/";
$dir = new DirectoryIterator("uploads_dir/");

foreach ($dir as $fileinfo) {
    if (!$fileinfo->isDot()) {
        $filename = $fileinfo->getFilename();
        $zip = new ZipArchive;
        $res = $zip->open($uploads . $fileinfo);
        if ($res === TRUE) {
            $zip->extractTo($uploads . $filename . "_extracted");
            $zip->close();
        } else {
            echo "Unable to unzip";
        }
    }
}

Both uploads look fine when I manually unzip or open them with 7zip in my Windows machine.

If I create two hex dumps of both zip files and compare them, this is what I get: https://gist.github.com/msoutopico/22a9ef647381c2e4d26313f135c526e2

Thanks a lot in advance for any tips.

UPDATE:

In case it's relevant, the zip files are created (saved) in a linux server, and both machines where this is done (the one that works, and the one that corrupt the package) run Windows 10.

msoutopico
  • 357
  • 3
  • 15
  • So the same Java library is used. But clearly the files are not the same. I notice that one uses back slashes, and the other forward slashes. Linux/OS-X and Windows? The remaining differences might be due to the used character set. Is it correct that you have trouble extracting the content from the Windows ZIP file? – KIKO Software Mar 09 '20 at 13:59
  • Good point @KIKOSoftware. The Zip spec say that backslashes are not valid path separators. Only "/" is officially supported. Start by checking that the zip files created on both machine are valid. If you have "unzip", run "unzip -t" against the zip files. If that doesn't show a problem run zipdetails on them (available here https://github.com/pmqs/IO-Compress/blob/master/bin/zipdetails if you don't aready have it) on a good example and a bad example and post the results – pmqs Mar 09 '20 at 14:03
  • Thanks for your quick replies. My understanding was that PHP could understand both slash conventions. I've run `unzip -t ` with both packages and I get "No errors detected in compressed data of " in both cases. – msoutopico Mar 09 '20 at 14:07
  • My understanding was that PHP could understand both slash conventions. Just to clarify, both the correct and the corrupt packages are created in two Windows machines running the same Windows version, and the Java library used to pack the contents is the same. – msoutopico Mar 09 '20 at 14:08
  • The terminal windows you just posted appears to show a Linux (or equivalent) environment, but it includes filenames with backslashes. That looks wrong. I'm guessing that the zip file from the bad machine is including the backslashes when it writes the zip file and the code that is uncompressing is not converting the backslashes to forward slashes. Can you either post the zipdetails outout or just an unzip -l on a good & bad example? – pmqs Mar 09 '20 at 14:11
  • One PC also appears to be generating old DOS 8.3 filenames rather than a modern filename. Is one of these PC's a bit odd or OLDER or the Disk formatted differently – RiggsFolly Mar 09 '20 at 14:13
  • https://support.microsoft.com/en-gb/help/121007/how-to-disable-8-3-file-name-creation-on-ntfs-partitions – RiggsFolly Mar 09 '20 at 14:15
  • Thanks @pmqs, I have updated my post with the result of the `unzip -l` command, but I'm not sure what you mean by "post the zipdetails outout" – msoutopico Mar 09 '20 at 14:24
  • @msoutopico - that was a typo. Should have read "port the zipdetails output". You don't need anything more though - the "unzip -l" output show the problem. One zip file is being created properly with "/" as the path separator, the other is using "\". The second one is a badly-formed zip file. Don't know how you've created the zip files, but I know there are issues with some windows applications. See https://superuser.com/questions/1382839/zip-files-expand-with-backslashes-on-linux-no-subdirectories – pmqs Mar 09 '20 at 14:30
  • @RiggsFolly: The page about disabling 8.3 file name creation on NTFS partitions doesn't seem to apply to Windows 10 (at least I can't see it listed). All the same, I have run `fsutil.exe behavior set disable8dot3 1` on a PowerShell terminal with admin rights in the faulty machine, but nothing seems to have changed. – msoutopico Mar 09 '20 at 14:34
  • So how do you explain the old DOS 8.3 filenames? – RiggsFolly Mar 09 '20 at 14:36
  • @RiggsFolly I guess the answer to that question is the purpose of this post.... – msoutopico Mar 09 '20 at 14:38
  • @pmqs: Both packages are created from a Java application that packes all contents of a folder using the standard java library `java.util.zip.ZipOutputStream`, in both cases in both cases the Java application runs in a Windows 10 machine, and the zip file is saved in a Linux server. – msoutopico Mar 09 '20 at 14:40
  • 1
    @msoutopico I'd guess that there are two different version of `java.util.zip.ZipOutputStream` being used. – pmqs Mar 09 '20 at 14:50
  • @pmqs: I have installed the plugin myself in both machines, downloading it from the same place, so that's very unlikely. The plugin is identical and, therefore, the `java.util.zip.ZipOutputStream` should be the same. – msoutopico Mar 09 '20 at 14:54
  • 1
    There has to be something different between the two boxes. I don't know `java.util.zip.ZipOutputStream`, so can't comment on what will influence the choice of path separator. It could be an environment variable, or something to do with the how the operating system is configured. – pmqs Mar 09 '20 at 14:59
  • That's my best guess too. The thing would be to find out what makes the library do that in one machine but not the other one (find out what environment variable or config setting is different, as you say). Thanks all. Any more ideas welcome. – msoutopico Mar 09 '20 at 15:04
  • Both this https://superuser.com/questions/1382839/zip-files-expand-with-backslashes-on-linux-no-subdirectories and this https://stackoverflow.com/questions/57248542/zip-file-is-created-with-windows-path-separator seem related. – msoutopico Mar 09 '20 at 15:09

1 Answers1

1

Sorted. Version 2 of the plugin was tweaked to transform path separators from \ to / in filenames. Now, even though the version 3 of the plugin was installed in both machines, in the faulty machine there was also an older one (version 1, previous to that tweak), which is the one that was being used instead of version 3. Just removing the version 1 duplicate has fixed the problem. @pmqs was right. Thank you everyone for helping me quickly solve this!

msoutopico
  • 357
  • 3
  • 15