0

I use the following code to copy/download files from an external server (any server via a URL) to my hosted web server(Dreamhost shared hosting at default settings).

<!DOCTYPE html>
<html>
<head>
    <title></title>
</head>
<body>
<form method="post" action="copy.php">
    <input type="submit" value="click" name="submit">
</form>
</body>
</html>
<!-- copy.php file contents -->
<?php
function chunked_copy() {
    # 1 meg at a time, adjustable.
    $buffer_size = 1048576; 
    $ret = 0;
    $fin = fopen("http://www.example.com/file.zip", "rb");
    $fout = fopen("file.zip", "w");
    while(!feof($fin)) {
        $ret += fwrite($fout, fread($fin, $buffer_size));
    }
    fclose($fin);
    fclose($fout);
    return $ret; # return number of bytes written
}
if(isset($_POST['submit']))
{
   chunked_copy();
} 
?>

However the function stops running at about once 2.5GB (sometimes 2.3GB and sometimes 2.7GB, etc) of the file has downloaded. This happens every time I execute this function. Smaller files (<2GB) rarely exhibit this problem. I believe nothing is wrong with the source as I separately downloaded the file flawlessly onto my home PC.

Can someone please remedy and explain this problem to me? I am very new to programming.

Also,

file_put_contents("Tmpfile.zip", fopen("http://example.com/file.zip", 'r')); 

exhibits similar symptoms as well.

salmanxk
  • 315
  • 5
  • 19
  • A bit of a long shot, but have you considered the maximum file size on the destination file system? There are several FS's with a 2GB max for individual files (FAT16/FATX/HFS/HPFS. I known that's not exactly 2.3GB, 2.5GB or 2.7GB, but perhaps the reported error masks the exact error a bit? – Marten Koetsier Aug 23 '15 at 13:05
  • That is most definitely not the case. I do not get any reported error. The 2.3/2.5/2.6GB the the incomplete file written on the disk. The incomplete file remains on the server until I delete it. A FS system issue would probably abort the write at a fixed interval, say always at the 1GB mark. – salmanxk Aug 23 '15 at 14:26
  • You're right that the varying end-sizes do not suggest a hard limit, also not the 2GB mentioned in my answer... Especially since you mention the increase to 3GB with Muhammet Arslan's answer. Interesting issue. I'll try to give it more thought. – Marten Koetsier Aug 23 '15 at 14:34
  • Your code suggests that you download a zip-file. I don't suppose that that zip-file could be divided over two or more smaller files? – Marten Koetsier Aug 23 '15 at 14:51
  • Sadly, the the source is not under my control. It is a Google Developer Group repository over 20GB in size divided into 5GB zip archives! – salmanxk Aug 23 '15 at 17:44

6 Answers6

1

I think the problem might be the 30 second time-out on many servers running PHP scripts.

PHP scripts running via cron or shell wont have that problem so perhaps you could find a way to do it that way.

Alternatively you could add set_time_limit([desired time]) to the start of your code.

Peter Blue
  • 192
  • 6
  • This might as well be the problem! If i am correct, half a minute is the time when the write/script stops. I'll contact support and ask them about this! – salmanxk Aug 23 '15 at 18:06
  • You could get the maximum execution time setting yourself: `ini_get('max_execution_time'); ` – Marten Koetsier Aug 23 '15 at 18:13
  • @Peter Blue: welcome on StackOverflow! Good to see a nice answer on your first SO-day! :) – Marten Koetsier Aug 23 '15 at 18:16
  • The response was 30 :) Is it possible to set/override the global max_execution_time only for one script separately or do I have to mess with the .htaccess files? – salmanxk Aug 24 '15 at 06:00
  • It probably is possible, see my new answer. – Marten Koetsier Aug 24 '15 at 09:54
  • I found a better solution: Add "set_time_limit([desired time]);" to the function that needs to override the global max_execution_time. It is bad practice to increase the global max_execution_time without any reason. Please edit your answer :) – salmanxk Aug 24 '15 at 13:48
  • @salmanxk: I don't agree. First, `set_time_limit` 'wraps' around setting the `ini_set` call. Moreover: `ini_set` does nothing to the global setting, it is only set for the current script. Using `set_time_limit` in the 'offending' function may get your script to run for too long a time (if the server software allows that). Therefore, IMHO, the `ini_set` is better. See also [this SO Q&A](http://stackoverflow.com/questions/8914257/difference-between-set-time-limit-and-ini-setmax-execution-time) on the topic. – Marten Koetsier Aug 25 '15 at 10:13
0

Maybe you can try curl to download file.

function downloadUrlToFile($url, $outFileName)
{
    //file_put_contents($xmlFileName, fopen($link, 'r'));
    //copy($link, $xmlFileName); // download xml file

    if(is_file($url)) {
        copy($url, $outFileName); // download xml file
    } else {
        $options = array(
          CURLOPT_FILE    => fopen($outFileName, 'w'),
          CURLOPT_TIMEOUT =>  28800, // set this to 8 hours so we dont timeout on big files
          CURLOPT_URL     => $url
        );

        $ch = curl_init();
        curl_setopt_array($ch, $options);
        curl_exec($ch);
        curl_close($ch);
    }
}
Muhammet Arslan
  • 975
  • 1
  • 9
  • 33
0

Explain: perhaps. Remidy: probably not.

It may be caused by the limits of PHP: the manual on the filesize function mentions in the section on the return value:

Note: Because PHP's integer type is signed and many platforms use 32bit integers, some filesystem functions may return unexpected results for files which are larger than 2GB.

It seems that the fopen function may cause the issue, as two comments (1, 2) were added (although modded down) on the subject.

It appears as if you need to compile PHP from source (with the CFLAGS="-D_FILE_OFFSET_BITS=64" flag) to enable large files (>2GB), but it might break some other functionality.

Since you're using shared histing: I guess you're out of luck.

Sorry...

Marten Koetsier
  • 3,389
  • 2
  • 25
  • 36
  • Is it possible to write a script that divides the downloading file into 1GB parts and then combines them once each 1GB individual part has been downloaded? I assume the source must allow parallel downloads and resume aborted download for this to work. – salmanxk Aug 23 '15 at 14:32
  • @salmanxk: I was just thinking in a similar direction. I'll try another answer... – Marten Koetsier Aug 23 '15 at 14:37
0

Since the problem occurs at an (as yet) unknown and undefined file-size, perhaps it is best to try a work-around. What if you just close and than re-open the output file after some number of bytes?

function chunked_copy() {
    # 1 meg at a time, adjustable.
    $buffer_size = 1048576; 
    # 1 GB write-chuncks
    $write_chuncks = 1073741824;
    $ret = 0;
    $fin = fopen("http://www.example.com/file.zip", "rb");
    $fout = fopen("file.zip", "w");
    $bytes_written = 0;
    while(!feof($fin)) {
        $bytes = fwrite($fout, fread($fin, $buffer_size));
        $ret += $bytes;
        $bytes_written += $bytes;
        if ($bytes_written >= $write_chunks) {
            // (another) chunck of 1GB has been written, close and reopen the stream
            fclose($fout);
            $fout = fopen("file.zip", "a");  // "a" for "append"
            $bytes_written = 0;  // re-start counting
        }
    }
    fclose($fin);
    fclose($fout);
    return $ret; # return number of bytes written
}

The re-opening should be with the append-mode, which will place the write-pointer (there is no read-pointer) at the end of the file, not overwriting bytes written earlier.

This will not solve any Operating System-level or File System-level issues, but it may solve any counting issue internal to PHP while writing to files.

Perhaps this trick can (or should) also be applied on the reading-end, but I'm not sure if you can perform seek-operations on a download...

Note that any integer overflows (beyond 2147483647 if you're on 32-bit) should be transparently solved by casting to float, so that should not be an issue.

Edit: count the actual number of bytes written, not the chunk size

Marten Koetsier
  • 3,389
  • 2
  • 25
  • 36
0

You get a time-out after 30s, probably caused by PHP (with default max_execution_time = 30s). You could try setting it to a larger time:

ini_set('max_execution_time', '300');

However, there are some caveats:

  • If the script is running in safe mode, you cannot set max_execution_time with ini_set (I could not find whether Dreamhost has safe mode on or off in shared hosting, you need to ask them, or just try this).

  • The web server may have an execution limit as well. Apache has this default to 300s (IIS as well, but given that Dreamhost provides 'full unix shell', Apache is more likely then IIS). But with a file size of 5GB, this should help you out.

Marten Koetsier
  • 3,389
  • 2
  • 25
  • 36
0

This is the best way I found for downloading very large files : fast and no need lot of memory.

public function download_large_file(string $url, string $dest)
{
    ini_set('memory_limit', '3000M');
    ini_set('max_execution_time', '0');

    try { 
        $handle1 = fopen($url, 'r');
        $handle2 = fopen($dest, 'w');

        stream_copy_to_stream($handle1, $handle2);

        fclose($handle1);
        fclose($handle2);

        return true;
        } 
    catch(\Exception $e) {
        return $e->getMessage();
        }

    return true;
}
Jukebox
  • 61
  • 1
  • 2