20

I am making use of simplehtmldom which has this funciton:

// get html dom form file
function file_get_html() {
    $dom = new simple_html_dom;
    $args = func_get_args();
    $dom->load(call_user_func_array('file_get_contents', $args), true);
    return $dom;
}

I use it like so:

$html3 = file_get_html(urlencode(trim("$link")));

Sometimes, a URL may just not be valid and I want to handle this. I thought I could use a try and catch but this hasn't worked since it doesn't throw an exception, it just gives a php warning like this:

[06-Aug-2010 19:59:42] PHP Warning:  file_get_contents(http://new.mysite.com/ghs 1/) [<a href='function.file-get-contents'>function.file-get-contents</a>]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found  in /home/example/public_html/other/simple_html_dom.php on line 39

Line 39 is in the above code.

How can i correctly handle this error, can I just use a plain ifcondition, it doesn't look like it returns a boolean.

Thanks all for any help

Update

Is this a good solution?

if(fopen(urlencode(trim("$next_url")), 'r')){

    $html3 = file_get_html(urlencode(trim("$next_url")));

}else{
    //do other stuff, error_logging
    return false;

}
Abs
  • 56,052
  • 101
  • 275
  • 409
  • why it is called via url, not filename? – Your Common Sense Aug 07 '10 at 16:45
  • @Col, its a remote file. – Abs Aug 07 '10 at 16:47
  • Somebody around here has an itchy downvote finger. – grossvogel Aug 07 '10 at 16:48
  • For the person that neg repped my question. Don't hide yourself, comment below as to why you think this question deserves a neg rep. – Abs Aug 07 '10 at 16:48
  • why it is remote file? why do you use fiesystem function to read an HTTP resource? – Your Common Sense Aug 07 '10 at 16:50
  • 1
    Well its a good question, so +1 – TheLQ Aug 07 '10 at 16:51
  • 2
    @Abs It's probably because you are downvoting all answers for no reason. – NullUserException Aug 07 '10 at 16:51
  • @NullUserException - I have only down voted 2 questions, those coming up with solutions that suppress errors rather than handling them. Also that is NO reason to down vote the question at all. You probably don't see this now as some questions have been edited. – Abs Aug 07 '10 at 16:53
  • What do you think of my updated solution? – Abs Aug 07 '10 at 16:55
  • @Col - I am not sure why filesystem functions are used. You need to ask the guys who wrote the class I mentioned in my question. Mind you, its a popular class. – Abs Aug 07 '10 at 16:56
  • Use CURL. I don't understand why the lib you are using does not already use CURL to be honest. It should only fall back to file_get_contents() when the CURL lib is unavailable. – Treffynnon Aug 07 '10 at 16:58
  • 1
    @Abs (and Col): How is using @ and the return code less legitimate 'error handling' than catching an exception, or the solution you currently have up there under 'Is this a good solution?'. Just because of the @? – grossvogel Aug 07 '10 at 16:58
  • See my updated answer. I also point out why error suppression is as valid as your solution. – quantumSoup Aug 07 '10 at 17:12
  • @Col Thanks for downvoting my answer without reading the explanation as to why I used `@` – quantumSoup Aug 07 '10 at 17:16
  • 2
    Suggested third party alternatives that actually use DOM instead of String Parsing: [phpQuery](http://code.google.com/p/phpquery/), [Zend_Dom](http://framework.zend.com/manual/en/zend.dom.html), [QueryPath](http://querypath.org/) and [FluentDom](http://www.fluentdom.org). – Gordon Aug 07 '10 at 17:33

5 Answers5

17

Here's an idea:

function fget_contents() {
    $args = func_get_args();
    // the @ can be removed if you lower error_reporting level
    $contents = @call_user_func_array('file_get_contents', $args);

    if ($contents === false) {
        throw new Exception('Failed to open ' . $file);
    } else {
        return $contents;
    }
}

Basically a wrapper to file_get_contents. It will throw an exception on failure. To avoid having to override file_get_contents itself, you can

// change this
$dom->load(call_user_func_array('file_get_contents', $args), true); 
// to
$dom->load(call_user_func_array('fget_contents', $args), true); 

Now you can:

try {
    $html3 = file_get_html(trim("$link")); 
} catch (Exception $e) {
    // handle error here
}

Error suppression (either by using @ or by lowering the error_reporting level is a valid solution. This can throw exceptions and you can use that to handle your errors. There are many reasons why file_get_contents might generate warnings, and PHP's manual itself recommends lowering error_reporting: See manual

quantumSoup
  • 27,197
  • 9
  • 43
  • 57
  • 1
    You can capture the output of @file_get_contents... If it's === FALSE, then you can throw your own exception, set a return code, or whatever. – grossvogel Aug 07 '10 at 16:38
  • 1
    @quantumSoup: Be aware that your first example `if(file_get...` will give an error if an empty file is read successfully, whereas the second one `if($contents === false)` will only return an error if there really is an error. – grossvogel Aug 07 '10 at 16:46
  • @gross Yes, I forgot to check for identical to false on the first one. Got rid of the whole first part though. – quantumSoup Aug 07 '10 at 16:47
  • @quantumSoup - I have tried the above after editing the simplehtmldom calss, view it here: http://pastebin.com/5TrEJqQF - I get the error: Fatal error: Cannot redeclare fget_contents – Abs Aug 07 '10 at 17:17
  • @Abs Apparently you have a `fget_contents` declared somewhere. Rename the function to `fgc_with_exception`, `file_get_exception`, or whatever (and rename the call in the library accordingly) – quantumSoup Aug 07 '10 at 17:24
  • @Abs Here's the [modified class](http://pastebin.com/TeNXUiTW), and here's a [usage example](http://pastebin.com/cG7EX6GM). It works here. – quantumSoup Aug 07 '10 at 17:32
  • @quantumSoup - Thanks, I couldn't find another function with the same name, but I have renamed `fget_contents_893`. However, it is always throwing an exception and my script that use to take hours to complete finishes within 10 seconds. Its not returning any html. I am checking what the problem is now, any ideas? – Abs Aug 07 '10 at 17:38
  • 1
    Strange, it seems file_get_contents doesn't like encoded urls. I have removed that and it seems to be executing fine. – Abs Aug 07 '10 at 17:41
  • Well this has been a rough question, at least its working now. Thank you very much quantumSoup for your continued help. :) – Abs Aug 07 '10 at 17:44
  • 1
    @Abs Indeed, you are not supposed to pass encoded URL's to any of PHP's file functions – quantumSoup Aug 07 '10 at 19:08
  • I'm passing in a url with query params, but the error message is printing out as if i passed it an encoded url `&` could that be indicative of a problem? I define the url exactly the line before making the call, and its not encoded there. Note, the errors are only sporadic, I thought it was just the API, now I wonder based on @Abs comment on this. – blamb Nov 30 '17 at 22:05
4

Use CURL to get the URL and handle the error response that way.

Simple example from curl_init():

<?php
// create a new cURL resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);

// grab URL and pass it to the browser
curl_exec($ch);

// close cURL resource, and free up system resources
curl_close($ch);
?>
Treffynnon
  • 21,365
  • 6
  • 65
  • 98
  • So the idea is to check the URL first before passing it to file_get_contents, I think that is a good idea. I think its better to use fopen. See my update, what do you think? – Abs Aug 07 '10 at 16:46
  • No, CURL will return the contents for you so there will be no need for a subsequent file_get_contents. – Treffynnon Aug 07 '10 at 16:48
  • You'll be interested in CURLOPT_RETURNTRANSFER in curl_setopt(): http://uk3.php.net/manual/en/function.curl-setopt.php – Treffynnon Aug 07 '10 at 16:49
  • I can not use the CURL or fopen to get the contents. I need to still make use of the file_get_contents provided by the API, I just need a check to see if file_get_contents hasn't returned an error and do something if it has. Again, I can not directly fiddle with file_get_contents as it is within the API of simplehtmldom like I have mentioned in my question. – Abs Aug 07 '10 at 16:51
  • I would just override that method in the library in my own class as its behaviour is clearly unsuitable in this instance. – Treffynnon Aug 07 '10 at 16:53
  • 1
    +1 Use CURL where available for all HTTP requests... It's designed for it... `file_get_contents` will work (in most cases), but it's not designed for HTTP, so it'll be quite hard to detect certain types of errors, etc... – ircmaxell Aug 07 '10 at 16:55
1

From my POV, good error handling is one of the big challenges in PHP. Fortunately you can register your own Error Handler and decide for yourself what to do.

You can define a fairly simple error handler like this:

function throwExceptionOnError(int $errorCode , string $errorMessage) {
    // Usually you would check if the error code is serious
    // enough (like E_WARNING or E_ERROR) to throw an exception
    throw new Exception($errorMessage);
}

and register it in your function like so:

function file_get_html() {
    $dom = new simple_html_dom;
    $args = func_get_args();
    set_error_handler("throwExceptionOnError");
    $dom->load(call_user_func_array('file_get_contents', $args), true);
    restore_error_handler();
    return $dom;
}
madmuffin
  • 963
  • 10
  • 26
1

To see why a file_get_contents call might have failed, you could just use php's error_get_last function:

if ($contents = file_get_contents($url)) {
    // it worked
}
else {
   die("Failed to fetch ".$url.", error: ".error_get_last()['message']);
}
Joey Rich
  • 391
  • 3
  • 7
0

IF you're fetching from an external URL the best handling is going to come from the introduction of HTTP library like Zend_Http. This isnt much different than using CURL or fopen except its going to extract the particulars of these "drivers" into a universal API and then you can choose which you want to use. Its also going to have some built in error trapping to make it easier on you.

If you dont want the overhead of another library then you can code it yourself obviously - in which case i always prefer CURL.

Lorenz Meyer
  • 19,166
  • 22
  • 75
  • 121
prodigitalson
  • 60,050
  • 10
  • 100
  • 114