0

I am using simplexml_load_file to parse an XML file that must follow a DTD. Both XML and DTD are local files.

    $obj_xml = simplexml_load_file(
        $str_xml_file,
        'SimpleXMLElement',
        LIBXML_DTDVALID + LIBXML_NOENT
    );
    if (false === $obj_xml) {
        throw new Exception("XML file is not valid");
    }

The XML file is something like this

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE mapping SYSTEM 'mapping.dtd' [
<!ENTITY data_file "data.csv">
]>
<mapping>
...
</mapping>

I was under the impression that if the XML was not valid according to the DTD specified then simplexml_load_file would return false but it doesn't. I have also tried checking if $obj_xml is an instance of the LibXMLError class, but same result.

It seems the DTD is totally ignored by simplexml_load_file. I have tried changing its name, to somethin non-existing, and still no error.

As I said, both the XML and DTD are local files. $str_xml_file is the absolute full pathname of the XML file and the DTD resides in the same directory.

hakre
  • 193,403
  • 52
  • 435
  • 836
giuliot
  • 156
  • 13

1 Answers1

0

Nothing in the documentation I see on the Web suggests that simplexml_load_file performs validation. Indeed, I see nothing in the SimpleXML documentation that suggests it can validate XML at all.

The [consensus[(http://www.ibm.com/developerworks/xml/library/x-simplexml/index.html) seems to be that SimpleXML has no validation facilities; IBM DeveloperWorks has an article that suggests that you will need to use the DOM interface to validate.

C. M. Sperberg-McQueen
  • 24,596
  • 5
  • 38
  • 65
  • Well, the fact that you can pass additional Lbixml parameters and that one of those parameters is LIBXML_DTDVALID made me think that simplexml_load_file could perform a validation. I will read that article and see if it helps. Thanks – giuliot Sep 14 '12 at 08:04
  • Ah, good point! I overlooked that. Hmm. In that case, have you tried [libxml-use-internal-errors()](http://www.php.net/manual/en/function.libxml-use-internal-errors.php) and [libxml-get-errors()](http://www.php.net/manual/en/function.libxml-get-errors.php)? – C. M. Sperberg-McQueen Sep 14 '12 at 15:42
  • Well, I ended up creating a DOMDocument from the file, validating it and then, if everything is fine, use simplexml_import_dom to have a SimpleXMLElement object. It works very well and by using libxml-get-errors() I was able to costumise the error message, thanks to an article here [http://www.ibm.com/developerworks/library/x-validxphp/index.html] – giuliot Sep 17 '12 at 15:02