0

I've tried searching for this issue but couldn't find out why it is actually failing at random stages.

I have a CSV with 2 columns, ID and URL, there are 100 rows. I have the following code to loop through the CSV rows and use put_file_contents() to download the data at the URLs into files that are named [ID].jpg.

I am also outputting data to the screen each time around the loop and adding a line to "log.txt". For some reason it just stops working after a random URL and doesn't put anything in the error_log.

Sometimes it's after 17 URLs, sometimes it after 27, sometimes its after 49. I am at a loss as to why it fails.

<?php

set_time_limit(3600);

$data = array_map('str_getcsv', file('IDtoImageURL2.csv'));

$total = 0;

echo "Count total: ".count($data)."<br /><br />";

for($i = 1; $i < count($data); $i++)
{
    file_put_contents("log.txt", $data[$i][1]." - ", FILE_APPEND | LOCK_EX);
    $total++;

    $url = $data[$i][1];
    $img = "photos/".$data[$i][0].'.jpg';
    $img2 = "photos/named/".$data[$i][0].'.jpg';

    echo "<b>Getting ".$data[$i][0].":</b><br />";
    echo "URL: ".$data[$i][1]."<br />";
    echo "putting $img<br />";
    echo "putting $img2<br />";

    $imgData = file_get_contents($url);

    if($imgData)
    {
        file_put_contents($img, $imgData);
        file_put_contents($img2, $imgData);

        echo "<b>Done</b><br /><br />";

        file_put_contents("log.txt", "done".PHP_EOL, FILE_APPEND | LOCK_EX);
    }
    else
    {
        echo "<b>Failed</b><br /><br />";

        file_put_contents("log.txt", "failed".PHP_EOL, FILE_APPEND | LOCK_EX);
    }   
}

echo "Total: ".$total;

?>

Here is the CSV data that I'm working with.

What's strange is that sometimes it posts some of the echo output to the screen and sometimes it just silently errors with a blank scree. The URLs are looped through in the same order and so I don't understand why the results keep differing. Any input would be greatly appreciated.

Thank you for your time

poncho
  • 1,100
  • 2
  • 15
  • 38
  • Why use `$data[$i][2]` if your CSV contains only 2 columns? – Syscall May 11 '18 at 13:44
  • 2
    is the script timing out, perhaps? I know you set a long timeout but I don't know how long each of your downloads takes. Have you got PHP errors and warnings switched on so those will be printed to the screen? – ADyson May 11 '18 at 13:45
  • Where is your `str_getcsv` function ? Did you try to do a `var_dump()` of some variables like `$data`or `file('IDtoImageURL.csv')` ? just to see if they contain what you want or something else – Fanie Void May 11 '18 at 13:47
  • Wikimedia could be throttling you with so many successive image requests so your script just hangs until it hits 3,600 seconds – MonkeyZeus May 11 '18 at 13:49
  • 1
    Also, you're logging ineffectively. You should log something with a timestamp directly before and after `$imgData = file_get_contents($url);` so that you can see how long the downloads take. – MonkeyZeus May 11 '18 at 13:51
  • @Syscall I copy/pasted an old version of the code (the csv used to have more columns but I stripped them out), edited now. – poncho May 11 '18 at 14:20
  • @ADyson They are small images (about 500kb each) and so no it's not timing out. Thanks for the idea though. – poncho May 11 '18 at 14:22
  • @FanieVoid My str_getcsv is there at top, line 2. Yes I've var_dump() $data and it looks like it should, Here it is on pastebin: https://pastebin.com/vyp3iB0r – poncho May 11 '18 at 14:24
  • @MonkeyZeus That could be it but I have done this with CSVs of many more wikimedia URLs in the past (I did something similar with PHP a couple of years ago) and it stops way before 3600s, about 30-60s on average. As for the logging, that's a good idea, I just made that quick log.txt because sometimes nothing is output to screen along with no errors in error_log, I'll change that now. – poncho May 11 '18 at 14:26
  • What you did a few years ago has zero relevance today, Wikimedia could have implemented throttling just yesterday for all we know. Make sure to turn on all error reporting and also log even more steps in your code such as the two `file_put_contents()` calls; for all I know your hard drive is failing and cannot write images to the disk or something. Don't assume that things just magically work. Reduce, deduce, and arrive at a logical conclusion. – MonkeyZeus May 11 '18 at 14:59
  • @MonkeyZeus True, thanks for the advice. I have all error/warning reporting on but nothing new is being written to error_log. I'll add more logging data though and see if I can find where the issue is. Thanks. – poncho May 11 '18 at 15:09

0 Answers0