2

I am editing an XML file and need to populate it with data from a database. DOM works but it is unable to scale to several hundreds of MBs so I am now using XMLReader and XMLWriter which can write the very large XML file. Now, I need to select a node and add children to it but I can't find a method to do it, can someone help me out?

I can find the node I need to add children to by:

if ($xmlReader->nodeType == XMLReader::ELEMENT && $xmlReader->name == 'data')
    {
        echo 'data was found';
        $data = $xmlReader->getAttribute('data');


    }

How do I now add more nodes/children to the found node? Again for clarification, this code will read and find the node, so that is done. What is required is how to modify the found node specifically? Is there a way with XMLWriter for which I have not found a method that will do that after reading through the class documentation?

Paul A.
  • 577
  • 2
  • 11
  • 24
  • it's **Reader**, it can only read, not write. more info at **[manual](http://php.net/manual/ru/book.xmlreader.php)**. Also **Writer** notes are right here: **[manual](http://www.php.net/manual/ru/book.xmlwriter.php)** – StasGrin Apr 05 '13 at 10:45
  • You don't seem to understand my question. I stated clearly that I am able to edit and can write, even very large files with. I can use DOMDocument, XMLReader and XMLWriter, so I am already familiar with the manuals. XMLReader and XMLWriter get the job done but I want now to be able to replace a specific node, which I can find with the code I posted. The question here is how do I modify that specific node with XMLWriter?? I have edited the question to make it clearer. Thanks! – Paul A. Apr 05 '13 at 13:13

1 Answers1

1

Be default the expanded nodes (missing in your question)

$node = $xmlReader->expand();

are not editable with XMLReader (makes sense by that name). However you can make the specific DOMNode editable if you import it into a new DOMDocument:

$doc  = new DOMDocument();
$node = $doc->importNode($node);

You can then perform any DOM manipulation the DOM offers, e.g. for example adding a text-node:

$textNode = $doc->createTextNode('New Child TextNode added :)');
$node->appendChild($textNode);

If you prefer SimpleXML for manipulation, you can also import the node into SimpleXML after it has been imported into the DOMDocument:

$xml = simplexml_import_dom($node);

An example from above making use of my xmlreader-iterators that just offer me some nicer interface to XMLReader:

$reader  = new XMLReader();
$reader->open($xmlFile);

$elements = new XMLElementIterator($reader, 'data');
foreach ($elements as $element) 
{
    $node = $element->expand();
    $doc  = new DOMDocument();
    $node = $doc->importNode($node, true);
    $node->appendChild($doc->createTextNode('New Child TextNode added :)'));

    echo $doc->saveXML($node), "\n";
}

With the following XML document:

<xml>
    <data/>
    <boo>
        <blur>
            <data/>
            <data/>
        </blur>
    </boo>
    <data/>
</xml>

The small example code above produces the following output:

<data>New Child TextNode added :)</data>
<data>New Child TextNode added :)</data>
<data>New Child TextNode added :)</data>
<data>New Child TextNode added :)</data>
hakre
  • 193,403
  • 52
  • 435
  • 836
  • Your approach is very detailed and I know it will solve the problem, assuming I was writing a few hundreds or thousands of records from the database. However, the records are several hundreds of thousands, scaling to over a million records. Since the DOMDocument holds holds the XML tree in memory, it fails after a few thousand records. I have used it extensively for this same project and I realized it does not scale. I am now working on an alternate solution, and will update if it works as required. Thanks for your help! – Paul A. Apr 06 '13 at 03:07
  • @Paulo: If you hold the XML tree in memory *you* are doing it wrong. The example code does not have such an implication. It only holds the single node in memory you want to change - not the whole tree. Which is a precondition to change it: To change something you need to have something. – hakre Apr 06 '13 at 07:46
  • My approach does not hold the XML tree in memory, that is what I communicated but maybe you missed it. In essence, I have used DOM and SimpleXMLElement which could not scale due to the very large size of the files, so my current approach, which works, does NOT hold the XML in memory. The single node I am building is several hundreds of MB data and that was enough to make DOMDocument fail quickly. Thanks for your input. – Paul A. Apr 06 '13 at 20:17
  • @Paulo: You must have misread my comment. As you're using XMLReader I know you're not holding the whole tree in memory. And if you look to the answer you'll see it is using XMLReader as well. It is also using DOMDocument however not for the whole tree but only for nodes that are expanded for DOM manipulation. As I read your question that is exactly what you're asking for: Editing a subset of a larger XML document. – hakre Apr 07 '13 at 09:03
  • @Paulo: And if you don't trust this (or the single node you expand itself is too large), you will need to do more parsing with XMLReader. That's also part in my answer suggesting to use the [XMLReader Iterators](https://github.com/hakre/XMLReaderIterator) as those give a nice interface without writing too complicated XMLReader parser code. If you share more details about your problem and some code-examples I'm pretty sure it's easy to outline another example how it can be used. – hakre Apr 07 '13 at 09:05
  • @hakre I like your approach, however what I think is missing is how to actually modify the file. I think that was the initial goal of Paul. Your code will successfully modify the XML content and output it but it does not solve how to actually write the changes without completely replacing the whole source file. I imagine changing 100 bytes of a 4GB file by rewriting 4GB is not the way to go. – John Oct 12 '16 at 20:01
  • 1
    @John: True, DOMDocument in the end is the whole document in memory. So this perhaps won't work in the 4GB case. In practice much earlier IIRC, something around 600MB or so become not so much practicable albeit it depends on the file structure a lot, not only it's size on disk. Anyway what you ask for is demonstrated in another answer which combines XMLReader witth XMLWriter so you can do streaming modifications on really large files: http://stackoverflow.com/a/24716074/367456 – hakre Oct 12 '16 at 21:20