0

My php script is converting a PDF file into a zip file containing images for each page of the PDF.

After loading the zip with images I'm transferring the zip to the headers like below.

ob_start();

header('Content-Transfer-Encoding: binary');
header('Content-disposition: attachment; filename="converted.ZIP"');
header('Content-type: application/octet-stream');

ob_end_clean();

readfile($tmp_file);
unlink($tmp_file);

exit();

The download is absolutely working fine in Windows, Linux and Mac.But when I'm requesting the same from an android device (normal browser or Chrome), an unreadable zip is being downloaded. On opening it through the file explorer it says "File is either corrupt or unsupported format" starting from Android 6 (not tested below this version).

I placed the ob_start() and ob_end_clean() function later even then it didn't work.

I checked many answers from stackoverflow but none of them working out like

  1. Forceful download not working for browser on Android phone on wap site
  2. Not able to download files with php in android browsers

What is the modification that is needed for android browsers?

<?php include 'headerHandlersCopy.php';
session_start();  
ob_start();
//echo session_id()."<br>";
?>

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8">
        <meta name="viewport" content="width=device-width, initial-scale=1">
        <link rel="stylesheet" href="../css/handleConvertPDFtoJPG.css">
        <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css">


        <title>Compressing Image</title>
    </head>

    <body>
        <!-- Progress bar -->

        <div id="wrapper">
            <h1 id="head1">Compressing Image</h1>
            <h1 id="head2">Converting Image</h1>
            <div id="myProgress">
                <div id="myBar">10%</div>
            </div>
            <br>
        </div>
        
        <!-- end -->

        <?php 
      
            //code to display errors.
            ini_set('display_errors', 1);
            ini_set('display_startup_errors', 1);
            error_reporting(E_ALL); 
             
            if ($_SERVER['REQUEST_METHOD'] == 'POST'){

                $session_id = session_id();
                $uploadPath = "../upload/pdfUploads/"; 
                $pdfFileNameWithOutExt = basename($_FILES["pdfDoc"]["name"],"pdf");
                $dotRemovedFileNameTemp = str_replace(".", "", $pdfFileNameWithOutExt);
                $dotRemovedFileName = $session_id.$dotRemovedFileNameTemp;
                

                $imgExt = ".jpg";
                $fileNameLocationFormat = $uploadPath.$dotRemovedFileName.$imgExt;
                $fileNameLocation = $uploadPath.$dotRemovedFileName;
                $status = null;

                $imagick = new Imagick();

                # to get number of pages in the pdf to run loop below.
                # the below function generates unreadable images for each page.
                $imagick->pingImage($_FILES['pdfDoc']['tmp_name']);
                $noOfPagesInPDF = $imagick->getNumberImages();
                
                $imagick->readImage($_FILES['pdfDoc']['tmp_name']);
                $statusMsg = "test";

                # writing pdf into images.
                try {
                    $imagick->writeImages($fileNameLocationFormat, true);
                    $status = 1; 
                }
                catch(Exception $e) {
                    echo 'Message: ' .$e->getMessage();
                    $status = 0;
                }

                $files = array();

                # storing converted images into array.
                # only including the readable images into the
                $arrayEndIndex = ($noOfPagesInPDF * 2)-1;
                for ($x = $arrayEndIndex; $x >= $noOfPagesInPDF; $x--) {
                    array_push($files,"{$fileNameLocation}-{$x}.jpg" );
                }

                # create new zip object
                $zip = new ZipArchive();

                # create a temp file & open it
                $tmp_file = tempnam('.', '');
                $zip->open($tmp_file, ZipArchive::CREATE);

                # loop through each file
                foreach ($files as $file) {
                    # download file
                    $download_file = file_get_contents($file);

                    #add it to the zip
                    $zip->addFromString(basename($file), $download_file);
                }

                # close zip
                $zip->close();


                # file cleaning code
                # only those pdf files will be deleted which the current user uploaded.
                # we match the sesion id of the user and delte the files which contains the same session id in the file name.
                # file naming format is: session_id + destination + fileName + extension
                
                $files = glob("../upload/pdfUploads/{$session_id}*"); // get all file names
                foreach($files as $file){ // iterate files
                  if(is_file($file)) {
                    unlink($file); // delete file
                  }
                }

                // send the file to the browser as a download
                ob_end_clean();


                header('Content-Description: File Transfer');
                header('Content-type: application/octet-stream');
                header('Content-disposition: attachment; filename="geek.zip"');
                //header("Content-Length: " . filesize($tmp_file));
                header('Content-Transfer-Encoding: binary');
                header('Expires: 0');
                header('Cache-Control: must-revalidate');
                header('Pragma: public');
                flush();
                readfile($tmp_file);  
                unlink($tmp_file);      
                
                //filesize($tmp_file) causing the "error opening the file" when opening the zip even in PC browsers.
            }
        ?>

Will B.
  • 17,883
  • 4
  • 67
  • 69
user10418143
  • 220
  • 3
  • 11
  • Do you mean a separate android app for browsing the zip file ? – user10418143 Jul 11 '22 at 12:37
  • I dont think so, cause I checked out downloading zip from other sites which is working fine without any app. – user10418143 Jul 11 '22 at 12:40
  • https://www.howtogeek.com/691018/how-to-open-a-zip-file-on-an-android-phone-or-tablet/ – RiggsFolly Jul 11 '22 at 12:44
  • https://androidcoach.ngontinh24.com/articles/how-to-read-zip-file-on-android-phone#toc-0 – RiggsFolly Jul 11 '22 at 12:46
  • Thanks mate for the reference links. Check out the zips from this links , it doesn't need any app and opens in default explorer. I was talking about this. – user10418143 Jul 11 '22 at 13:03
  • https://www.c-programming-simple-steps.com/c-programming-examples.html – user10418143 Jul 11 '22 at 13:03
  • _'On opening it through the file explorer it says "File is either corrupt or unsupported format'"_ - what do you see, when you open the file in a text or hex editor instead? Are there any PHP error messages contained in what should actually be the binary PDF data? – CBroe Jul 11 '22 at 13:15
  • I just opened the unreadable in the text editor in which the first 50 lines were html and then rest a unreadable format like this "PK<0*03><0*04><0*14><0*00>ï”ëTø¾×ê6" – user10418143 Jul 11 '22 at 13:26
  • 1
    Please post the full PHP file contents for the download and zip file conversion scripts. For general tips; `ob_start();` should be on the first line after ` – Will B. Jul 11 '22 at 13:49
  • What about also adding `header("Content-Length: " . filesize($tmp_file));` ? – Patrick Janser Jul 11 '22 at 14:00
  • For which reason are you using the *output buffering* functions? Is it to capture errors? Usually *output buffering* is used to capture the output and alter it. Here, all the `header()` calls should not be printing to the buffer as they are just setting the response headers. But later, if `readfile()` did not totally print the content to the output stream then you may could use `ob_flush()` or `ob_end_flush()` before your `exit()` and keep your `ob_start()`. But in this case replace `ob_end_clean()` by `ob_clean()` as we don't want to close the output buffer before flushing it to send it. – Patrick Janser Jul 11 '22 at 14:22
  • Hi Will B. I have added the link of my full code in the post. – user10418143 Jul 11 '22 at 14:28
  • header("Content-Length: " . filesize($tmp_file)); causing the "error opening the file" even in the desktop browsers. – user10418143 Jul 11 '22 at 14:29
  • 1
    You are printing some HTML code and also doing some logic to send the ZIP content in the same PHP file. OK, this is possible but in this case don't mix both of them. No HTML should be sent to the output buffer if you are sending some binary ZIP data. Now I see why you where using the *output buffering*. It was to clean it up in order to avoid sending the HTML content. Better move up your `if ($_SERVER['REQUEST_METHOD'] == 'POST') {` condition and don't print any text if you are sending binary data. Why would `filesize($tmp_file)` not work and `readfile($tmp_file)` work? Async lib? – Patrick Janser Jul 11 '22 at 14:58
  • So should all the html content be removed and need to keep only the php part ? – user10418143 Jul 11 '22 at 15:08

1 Answers1

1

The issue appears to be caused by order of operations processing and including the HTML in the response to the client.

To circumvent the issues, I recommend using a separate script file for the POST request handler, as opposed to including it procedurally in the same view script. Otherwise, wrap the POST request processing in an if condition at the top of the script, ending it with exit to stop the response from continuing further.

This is partially causing the issue with the filesize() call, since the response size that includes the zip file and additional HTML differs from file size of only the zip file.

The below was tested in Google Chrome for Windows and Android 11.

# code to display errors
// USE ERROR REPORTING TO LOG FILES INSTEAD
# ini_set('display_errors', 1);
# ini_set('display_startup_errors', 1);
# error_reporting(E_ALL);
if (!session_id()) {
    // always ensure session is not already started before starting
    session_start();
}
if ('POST' === $_SERVER['REQUEST_METHOD'] &&
    !empty($_FILES['pdfDoc']) && // ensure files were uploaded
    UPLOAD_ERR_OK === $_FILES['pdfDoc']['error'] // ensure file uploaded without errors
) {
    $session_id = session_id();
    // removed redundant variable names
    // use absolute path with __DIR__ instead of relative
    $uploadSessionPath = $uploadPath = __DIR__ . '/../upload/pdfUploads/';
    $uploadSessionPath .= $session_id; // append session path
    // ensure upload directory exists
    if (!is_dir($uploadPath) && !mkdir($uploadPath, 0777, true) && !is_dir($uploadPath)) {
        throw new \RuntimeException(sprintf('Directory "%s" was not created', $uploadPath));
    }
    $fileNameLocation = $uploadSessionPath . str_replace('.', '', basename($_FILES['pdfDoc']['name'], 'pdf'));

    # convert pdf pages into images and save as JPG in the upload session path.
    try {
        $pdfDocFile = $_FILES['pdfDoc']['tmp_name'];
        $imagick = new Imagick();
        # get number of pages in the pdf to loop over images below.
        $imagick->pingImage($pdfDocFile);
        $noOfPagesInPDF = $imagick->getNumberImages();
        $imagick->setResolution(150, 150); // greatly improve image quality
        $imagick->readImage($pdfDocFile);
        $imagick->writeImages($fileNameLocation . '.jpg', true);
    } catch (Exception $e) {
        throw $e; //handle the exception properly - don't ignore it...
    }
    // ensure there are pages to zip
    if ($noOfPagesInPDF > 0) {
        // reduced to single iteration of files to reduce redundancy
        # create a temp file & open it
        $zipFile = tempnam(__DIR__, ''); // use absolute path instead of relative
        # create new zip object
        $zip = new ZipArchive();
        $zip->open($zipFile, ZipArchive::CREATE);
        # store converted images to zip file only including the readable images
        $arrayEndIndex = ($noOfPagesInPDF * 2) - 1;
        for ($x = $arrayEndIndex; $x >= $noOfPagesInPDF; $x--) {
            $file = sprintf('%s-%d.jpg', $fileNameLocation, $x);
            clearstatcache(false, $file); // ensure stat cache is clear
            // ensure file exists and is readable
            if (is_file($file) && is_readable($file)) {
                // use ZipArchive::addFile instead of ZipArchive::addFromString(file_get_contents) to reduce overhead
                $zip->addFile($file, basename($file));
            }
        }
        $zip->close();

        # file cleaning code
        # only those pdf files will be deleted which the current user uploaded.
        # we match the session id of the user and delete the files which contains the same session id in the file name.
        # file naming format is: session_id + destination + fileName + extension
        foreach (glob("$uploadSessionPath*") as $file) {
            clearstatcache(false, $file); // ensure stat cache is clear
            // ensure a file exists and can be deleted
            if (is_file($file) && is_writable($file)) {
                unlink($file);
            }
        }

        # send the file to the browser as a download
        if (is_file($zipFile) && is_readable($zipFile)) {
            header('Content-Description: File Transfer');
            header('Content-type: application/octet-stream');
            header('Content-disposition: attachment; filename="geek.zip"');
            header('Content-Length: ' . filesize($zipFile)); // Content-Length is a best-practice to ensure client receives the expected response, if it breaks the download - something went wrong
            header('Content-Transfer-Encoding: binary');
            header('Expires: 0');
            header('Cache-Control: must-revalidate');
            header('Pragma: public');
            readfile($zipFile);
            if (is_writable($zipFile)) {
                unlink($zipFile);
            }
            exit; // stop processing
        }
        // no pages in PDF were found - do something else
    }

   // file was not sent as a response - do something else
}

// use absolute path __DIR__ and always require dependencies to ensure they are included
// do not know what this contains...
require_once __DIR__ . '/headerHandlersCopy.php'; 
?>

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8">
        <meta name="viewport" content="width=device-width, initial-scale=1">
        <link rel="stylesheet" href="../css/handleConvertPDFtoJPG.css">
        <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css">


        <title>Compressing Image</title>
    </head>

    <body>
        <!-- Progress bar -->

    <div id="wrapper">
        <h1 id="head1">Compressing Image</h1>
        <h1 id="head2">Converting Image</h1>
        <div id="myProgress">
            <div id="myBar">10%</div>
        </div>
        <br>
    </div>

Android File Manager Screenshot

Lastly as a general tip, do not use the PHP closing tag ?> to end PHP script context unless changing the context to non-PHP output like HTML or text. Otherwise, the line-break(s)/space(s) and other non-visible characters that exist after the ?> will be included in the response output, often-times causing unexpected results due to issues with include or causing corrupted responses like with file download data and redirects.

PHP only response

<?php 
// ...
echo 'PHP ends automatically without closing tag';

End response with PHP

<html>
</html>
<?php 

echo 'PHP ends automatically without closing tag';

Mixed response with PHP

<html>
<?php 

echo 'Mixed PHP continues as HTML';

?>
</html>
Will B.
  • 17,883
  • 4
  • 67
  • 69
  • Thanks mate, I will adhere to the guidelines from now on. I solved it by stripping away all the HTML tags from the PHP script file and its working fine now. I never knew HTML tags can create such problems in the headers. – user10418143 Jul 14 '22 at 05:16