2

I'm currently validating XML prior to reading it using the following validation funcition I put together...

public static function validate($xml) {
    libxml_use_internal_errors(TRUE);
    $doc = new DOMDocument('1.0', 'utf-8');
    $doc->loadXML($xml);
    $errors = libxml_get_errors();
    if (empty($errors)) { return TRUE; }
    $error = $errors[0];
    if ($error->level < 3) {
        return TRUE;
    } else {
        return $errors;
    }
}

and once it's been validated we jump into reading the XML using a while loop basically like so...

$reader = new XMLReader();
$reader->xml($xml);
while($xml->read()) { ... }

The problem I'm having is for some reason, some XML is passing validation, but is causing errors during the read() operation. In our case, the body of the while loop is substantial so it's unrealistic to wrap it in a try/catch block if that's even possible (I don't think these are exceptions, but correct me if I'm wrong).

What is the best way to capture errors emanating from our while loop's read() method?

The only solution I've been able to find so far is from here: http://www.ibm.com/developerworks/library/x-pullparsingphp/ using the track_errors PHP INI setting, but this seems like overkill for capturing errors in one spot, is there no other way?


EDIT: The error level coming from the read() errors is E_WARNING, and the exact error message is XMLReader::read(): An Error Occured while reading.

EDIT 2: I've found the following SO question which relates: Getting PHP's XMLReader to not throw php errors in invalid documents however there's seems to be conflicting reports of whether the solutions work in the comments. In any case, they're all workarounds.

Community
  • 1
  • 1
oucil
  • 4,211
  • 2
  • 37
  • 53
  • Which PHP version are you using? This shouldn't be an issue any longer, you must be way behind. PHP 5.3? In case you have this issue, please provide as well the XML that is causing you this problem (next to the PHP version you need support for). – hakre Aug 20 '15 at 20:25
  • And I'm a bit curious why you're re-parsing with XMLReader as you already have it in **DOMDocument**. The interface **DOMDocument** offers normally is more distinct and precise, an iteration similar as `while($node = XMLReader::read()) { ...` should be [easily possible as well](https://github.com/hakre/Iterator-Garden/tree/development/src/DOM) (**DOMNodeIterator** is in document-order, you can filter to elements with **DOMElementFilter**). – hakre Aug 20 '15 at 20:29
  • @hakre Nope, PHP 5.6, and 5.4 on my dev servers and 5.4 in production. There are two use cases which are not apparent in my examples, the validation method I illustrated is a static utility used all over the place in our application/platform, so it needs to be usable independent of anything else, and in this scenario, the DOM is great for quick validation. The other method is where the XML is converted to a structured array, and in this case the DOM can be very low performance for very very large XML files, whereas XMLReader is great for parsing one line at a time. – oucil Aug 20 '15 at 23:38
  • Sure, I didn't wanted to question your use of **XMLReader** by my comment. Can you isolate the XML that triggers the warnings? – hakre Aug 21 '15 at 05:52
  • @hakre I have a feeling that it's related to non utf8 chars, but that's not what I'm asking, I want to capture the errors, not solve the xml at this point. – oucil Aug 21 '15 at 05:57
  • That's not why I ask for it. Only if it's reproduceable one can answer the question how to capture the errors. Otherwise the general suggestion applies: for libxml errors, use libxml error handling. for PHP errors use php error handling. In your case you need to handle both cases, libxml and php. – hakre Aug 21 '15 at 08:14
  • @hakre *ANY* error will cause the situation, so just use any mal-formed XML and the `read()` method will fail and cause an error. I don't care about the cause of the error at all, I only care about capturing it from within the `while` loop. – oucil Aug 24 '15 at 14:13
  • what if the error causes to *not* enter the while-loop (again)? You can't handle the error then within the while-loop. For the rest, you should be able to handle that with standard PHP error handling. E.g. you can make use of the error supression operator and check the last error yourself. But then take care if you don't enter the loop again because of the error condition. – hakre Aug 25 '15 at 13:44
  • @hakre I'm not trying to continue the while loop, but I am trying to gracefully capture the error, record it in the logs, return an internal error code from our method, but continue processing. This is part of a batch operation. The notes from http://stackoverflow.com/questions/14927565/getting-phps-xmlreader-to-not-throw-php-errors-in-invalid-documents indicate that internal errors has no effect, nor do any constants passed in, so the error is getting forced into our error handler rather than being dealt with locally and quietly. – oucil Aug 25 '15 at 18:00
  • that post is about libxml (IIRC), you're also concerned about PHP error handling. that would be the [error suppression operator](https://secure.php.net/manual/en/language.operators.errorcontrol.php) (handle with care as it's easy to introduce problems using it) and checking for errors your own (ref: https://secure.php.net/manual/en/book.errorfunc.php). if you could provide some XML at least that is able to provoke the one or other error I'm sure I can wrap something up that might show *one* angle on how to deal with that (incl. loop-handling). – hakre Aug 25 '15 at 18:48

0 Answers0