8

I'm using simpleXML to add in a child node within one of my XML documents... when I do a print_r on my simpleXML object, the < is still being displayed as a < in the view source. However, after I save this object back to XML using DOMDocument, the < is converted to &lt; and the > is converted to &gt;

Any ideas on how to change this behavior? I've tried adding dom->substituteEntities = false;, but this did no good.

    //Convert SimpleXML element to DOM and save
    $dom = new DOMDocument('1.0');
    $dom->preserveWhiteSpace = false;
    $dom->formatOutput = false;
    $dom->substituteEntities = false;
    $dom->loadXML($xml->asXML());
    $dom->save($filename);

Here is where I'm using the <:

$new_hint = '<![CDATA[' . $value[0] . ']]>';               
$PrintQuestion->content->multichoice->feedback->hint->Passage->Paragraph->addChild('TextFragment', $new_hint);

The problem, is I'm using simple XML to iterate through certain nodes in the XML document, and if an attribute matches a given ID, a specific child node is added with CDATA. Then after all processsing, I save the XML back to file using DOMDocument, which is where the < is converted to &lt, etc.

Here is a link to my entire class file, so you can get a better idea on what I'm trying to accomplish. Specifically refer to the hint_insert() method at the bottom.

http://pastie.org/1079562

ThinkingInBits
  • 10,792
  • 8
  • 57
  • 82
  • 1
    `<` is simply not a legal character within an XML element (unless you're in a CDATA section). What are you trying to accomplish? – MvanGeest Aug 06 '10 at 13:19
  • I'm trying to accomplish adding a CDATA tag... Check original post for updated code – ThinkingInBits Aug 06 '10 at 13:20
  • Thanks for the vote down! CDATA tag is necessary (unless you escape)in xml when you have '<' and '>' within a node! Same as XHTML... – Alex Aug 06 '10 at 13:22
  • Hmm, I didn't give you a vote down – ThinkingInBits Aug 06 '10 at 13:23
  • How is that question different from http://stackoverflow.com/questions/3418796/how-can-i-get-php-simplexml-to-save-as-itself-instead-of-lt? – Gordon Aug 06 '10 at 13:25
  • Because if you would actually read the question, I thought it was SimpleXML causing the problem. Which it is not. Please don't post useless comments in my thread. – ThinkingInBits Aug 06 '10 at 13:27
  • I suggest not trying to add a CDATA section. Add the text you want, and let the XML engine decide if it is going to use CDATA or entities for characters with special meaning. (If you really want to care about the *formatting* of the XML, then you might have to look for other tools, I don't think SimpleXML lets you specify to use CDATA instead of entities). – Quentin Aug 06 '10 at 13:27
  • @ThinkingInBits it's not useless. You simply fail to understand how SimpleXML and DOM works. That's all. Closevoting. – Gordon Aug 06 '10 at 13:30
  • Unfortunately, the people that created our XML reader have their own set ways of handling with XML data, this absolutely must be in a CDATA tag. If I had control over the parser, believe me I would change a lot of things. – ThinkingInBits Aug 06 '10 at 13:32
  • Gordon, REREAD the bottom of my original post. I'll post a link to my entire class file and maybe you can come up with a better way. Specifically refer to the hint_insert method. – ThinkingInBits Aug 06 '10 at 13:33
  • @ThinkingInBits If SimpleXml is not the issue, like you claim, then why dont you just save the XML with SimpleXML, like I have shown you how to do in http://stackoverflow.com/questions/3418376/how-to-save-changed-simplexml-object-back-to-file instead of saving it with DOM. You gain nothing from using DOM for saving the XML except for the formatting, which you do not need at all if the XML is meant to be machine-read anyway. Why do you use this mishmash? Either you need control over the nodes, then use DOM for everything or you dont need control, then stay within the confines of SimpleXml. – Gordon Aug 06 '10 at 13:37
  • You're right, I'll try this. Selected yours as the answer on that thread by the way. I just assumed it was simpleXML causing the problem at first, I guess we shall see here. – ThinkingInBits Aug 06 '10 at 13:44
  • Ok, my mistake... changed the save to use just $xml->asXML($filename) but the problem still occurs. I guess I could try all of the processing through dom; but is there an easy way to iterate through specific nodes using DOM? – ThinkingInBits Aug 06 '10 at 13:47
  • @ThinkingInBits Thanks and sorry if I came across rude. – Gordon Aug 06 '10 at 13:47
  • @Gordon Ditto.. Voting your answer on the other SimpleXML thread. And thanks for taking the time to help. – ThinkingInBits Aug 06 '10 at 13:51

2 Answers2

11

SimpleXML and php5's DOM module use the same internal representation of the document (facilitated by libxml). You can switch between both apis without having to re-parse the document via simplexml_import_dom() and dom_import_simplexml().
I.e. if you really want/have to perform the iteration with the SimpleXML api once you've found your element you can switch to the DOM api and create the CData section within the same document.

<?php
$doc = new SimpleXMLElement('<a>
  <b id="id1">a</b>
  <b id="id2">b</b>
  <b id="id3">c</b>
</a>');


foreach( $doc->xpath('b[@id="id2"]') as $b ) {
  $b = dom_import_simplexml($b);
  $cdata = $b->ownerDocument->createCDataSection('0<>1');
  $b->appendChild($cdata);
  unset($b);
}

echo $doc->asxml();

prints

<?xml version="1.0"?>
<a>
  <b id="id1">a</b>
  <b id="id2">b<![CDATA[0<>1]]></b>
  <b id="id3">c</b>
</a>
VolkerK
  • 95,432
  • 20
  • 163
  • 226
  • THANK YOU. This is the information I needed. I didn't realize I could use both interchangeably. – ThinkingInBits Aug 06 '10 at 13:50
  • Will setting $b = dom_import_simplexml($b) mess up the iteration? – ThinkingInBits Aug 06 '10 at 14:03
  • "Will setting $b= ... mess up the iteration?" - Obviously not ;-) But use another variable if you like. I just like to have as few variables as possible (within reason, e.g. $cdata could be left out as well) and since $b serves no other purpose I re-used it. – VolkerK Aug 06 '10 at 14:09
3

The problem is that you're likely adding that as a string, instead of as an element.

So, instead of:

$simple->addChild('foo', '<something/>');

which will be treated as text:

$child = $simple->addChild('foo');
$child->addChild('something');

You can't have a literal < in the body of the XML document unless it's the opening of a tag.

Edit: After what you describe in the comments, I think you're after:

DomDocument::createCDatatSection()

$child = $dom->createCDataSection('your < cdata > body ');
$dom->appendChild($child);

Edit2: After reading your edit, there's only one thing I can say:

You're doing it wrong... You can't add elements as a string value for another element. Sorry, you just can't. That's why it's escaping things, because DOM and SimpleXML are there to make sure you always create valid XML. You need to create the element as an object... So, if you want to create the CDATA child, you'd have to do something like this:

$child = $PrintQuestion.....->addChild('TextFragment');
$domNode = dom_import_simplexml($child);
$cdata = $domNode->ownerDocument->createCDataSection($value[0]); 
$domNode->appendChild($cdata);

That's all there should be to it...

ircmaxell
  • 163,128
  • 34
  • 264
  • 314