1

I have switched my zend framework version from 1.11 to 1.12.3 In the tests i detect a strange error that i cannot explain. I have some xml fetch and processing routines that yell at me.

PHP Fatal error:  Uncaught exception 'Zend_Dom_Exception' with message 
'Invalid XML: Detected use of illegal DOCTYPE' in ....

In zend framework 1.11 i had library/Zend/Dom/Query.php:197:

switch ($type) {
    case self::DOC_XML:
        $success = $domDoc->loadXML($document);
        break;
....

In 1.12 the code looks strange

switch ($type) {
   case self::DOC_XML:
       $success = $domDoc->loadXML($document);
       foreach ($domDoc->childNodes as $child) {
           if ($child->nodeType === XML_DOCUMENT_TYPE_NODE) {
               require_once 'Zend/Dom/Exception.php';
               throw new Zend_Dom_Exception(
                    'Invalid XML: Detected use of illegal DOCTYPE'
               );
            }
       }
       break;
.....

If i get this right, this routine will not parse doc xml with doctype. Little example that fails on my computer all the time:

require_once 'Zend/Dom/Query.php'; 
$f = '<?xml version="1.0" standalone="yes"?>' .
    '<!DOCTYPE hallo [<!ELEMENT hallo (#PCDATA)>]>' .
    '<hallo>Hallo Welt!</hallo>';

$dom = new Zend_Dom_Query($f);
$results = $dom->queryXpath('//hallo');

Can someone explain this to me??? I testeted with Zend Framework 1.12.3 and php 5.3.2 and 5.4.6

Rolando Isidoro
  • 4,983
  • 2
  • 31
  • 43
jami
  • 190
  • 2
  • 14

2 Answers2

1

I read it the same way as you did. Googled about it for a while and found the following in the HTML <!DOCTYPE> Declaration article from w3schools:

The declaration must be the very first thing in your HTML document, before the tag.

I've coded a small test based on your example and just moved the <!DOCTYPE> declaration to the top of your XML and it seems to work:

<?php
require_once 'Zend/Dom/Query.php'; 
$f = <<<XML
<!DOCTYPE hallo [<!ELEMENT hallo (#PCDATA)>]>
<?xml version="1.0" standalone="yes"?>
<hallo>Hallo Welt!</hallo>
XML;

$dom     = new Zend_Dom_Query($f);
$results = $dom->queryXpath('//hallo');

foreach ($results as $result) {
    echo $result->C14N();
}

Output:

<hallo>Hallo Welt!</hallo>
Rolando Isidoro
  • 4,983
  • 2
  • 31
  • 43
  • Hm. Is this the intention? I mean i read the whole xml rfc document (5th edition) and even there i find examples that are looking like my example. – jami Jun 28 '13 at 11:04
  • [W3 xml spec](http://www.w3.org/TR/REC-xml/#sec-physical-struct). Have you tried this in a validator? It says that my origin example is valid. And the component should support the functionality of the rfc. Furthermore i can't restucture the xml document. That are feeds. And i'm still thinking that they are syntactically correct – jami Jun 28 '13 at 11:26
  • 1
    I did run your original example through a W3's validator and it did return valid. With my change it isn't, but it does work in ZF. Since you can't restructure the XML document, is using PHP's DOMDocument an option for you? – Rolando Isidoro Jun 28 '13 at 13:37
  • I think i will go your way and substitute Zend_Dom_Query directly with DomDocument. Also it might be that your DOCTYPE before technique bypasses the security mechanism from zf :D I will test this. So thanks for the help – jami Jul 01 '13 at 11:04
1

Ok i had a little talk with Matthew Weier O'Phinney and the reason why DOCTYPES are not accepted anymore. The reason is the security patch here http://framework.zend.com/security/advisory/ZF2012-02

They disabled the doctype feature to prevent XXE and XEE.

"I closed the report because it's something we cannot fix, due to security implications. It doesn't matter if it's valid XML -- XEE and XXE vectors utilize perfectly valid XML in order to exploit issues in the underlying XML parser. Because we cannot control what version of libxml is used in every PHP distribution on which ZF is deployed, we must be defensive in our code. Furthermore, the moment we add a switch to disable the XEE and XXE vector checks, folks will use that switch without understanding the reason behind them.

There are a number of tools you can use to pre-process XML -- including pandoc or the PCRE tools in PHP -- if you cannot control the source of the XML and still want to parse it with our tools."

I've mentioned that this was already fixed by libxml2 itself in 2012. But he argued that they have no idea witch version of libxml2 is used in the special cases.

So what are the solutions?

  1. Use XML Preprocessor
  2. Write a patch that removes this changes (only if you are sure that you use a XXE XEE patched libxml2 version)
  3. Write your own components
  4. Use php components SimpleXMLElement or DomDocument

Thank you Rolando Isidoro for the help :)

jami
  • 190
  • 2
  • 14