1

I'm using XMLReader to find text in a Office OpenXML document and XMLWriter to write it to a xliff file. I then modify the text in the other xml file and now I want to rebuild the OpenXML document. I am using the XML iterator class like suggesetd in this question

I want to replace the nodes content in the original file with the nodes content from the xliff file, checking if the count of node is the same from attribute. So the 10th node will be replaced with the if it exists.

What's happening now with my code is that it's not replacing the tag contents. It's generating self enclosed empty tags and placing the original content after it. And just after this tag it's closing the document.

xliff file - segments.xliff

    <?xml version="1.0"?>
<xliff>
 <file original="/home/brgwe507/public_html/previas/wp-content/uploads/sites/9/2015/03/Cap32.docx" datatype="x-noveritis" source-language="pt-BR">
  <body>
   <trans-unit id="177">
    <source><g id="217">In a thermodynamic process, energy is transferred to or from a system by two primary methods.</g></source><seg-source><mrk mtype="seg" id="1"><g id="217">In a thermodynamic process, energy is transferred to or from a system by two primary methods.</g></mrk></seg-source>
    <target><mrk mtype="seg" id="1"><g id="217">tradução segmento1.</g></mrk> </target>
   </trans-unit>
   <trans-unit id="178">
    <source><g id="217">The first method to be considered is work and the second, which will follow in Section 3.2, is heat transfer.</g></source><seg-source><mrk mtype="seg" id="2"><g id="217">The first method to be considered is work and the second, which will follow in Section 3.2, is heat transfer.</g></mrk></seg-source>
    <target><mrk mtype="seg" id="2"><g id="217">tradução segmento 2</g></mrk> </target>
   </trans-unit>
   <trans-unit id="179">
    <source><g id="218">Work, designated </g><g id="219">W</g><g id="220">, is defined in mechanics as the product of a force and the distance moved in the direction of the force.</g></source><seg-source><mrk mtype="seg" id="3"><g id="218">Work, designated </g><g id="219">W</g><g id="220">, is defined in mechanics as the product of a force and the distance moved in the direction of the force.</g></mrk></seg-source>
    <target><mrk mtype="seg" id="3"><g id="218">tradução</g><g id="219">teste</g><g id="220">, segmento 3</g></mrk> </target>
   </trans-unit>
   <trans-unit id="180">
    <source><g id="220">A more general definition of work is used in thermodynamics:</g><g id="221">Work</g><g id="222">, an interaction between a system and its surroundings, is done by a system if the sole external effect on the surroundings could be the raising of a weight.</g></source><seg-source><mrk mtype="seg" id="4"><g id="220">A more general definition of work is used in thermodynamics:</g><g id="221">Work</g><g id="222">, an interaction between a system and its surroundings, is done by a system if the sole external effect on the surroundings could be the raising of a weight.</g></mrk></seg-source>
    <target><mrk mtype="seg" id="4"><g id="220">tradução deste segmento:</g><g id="221">para</g><g id="222">teste de tradução segmento 4.</g></mrk> </target>
   </trans-unit>
   <trans-unit id="181">
    <source><g id="222">The magnitude of the work is the product of the weight and the distance it could be </g><g id="223">lifted.This</g><g id="224"> definition allows a battery to do work since the energy produced by the battery could be the lifting of a weight, as suggested in Fig.</g></source><seg-source><mrk mtype="seg" id="5"><g id="222">The magnitude of the work is the product of the weight and the distance it could be </g><g id="223">lifted.This</g><g id="224"> definition allows a battery to do work since the energy produced by the battery could be the lifting of a weight, as suggested in Fig.</g></mrk></seg-source>
    <target><mrk mtype="seg" id="5"><g id="222">tradução para teste </g><g id="223">xliff.</g><g id="224"> semgneto 5 ladsfoienfoqeiwnf</g></mrk> </target>
   </trans-unit>
   <trans-unit id="182">
    <source><g id="224">3.2.Work has unit</g><g id="225">s of N </g><g id="226">[S]</g><g id="227"> </g><g id="228">m 5 J.</g></source><seg-source><mrk mtype="seg" id="6"><g id="224">3.2.Work has unit</g><g id="225">s of N </g><g id="226">[S]</g><g id="227"> </g><g id="228">m 5 J.</g></mrk></seg-source>
    <target><mrk mtype="seg" id="6"><g id="224">3.2. teste</g><g id="225">1 de 7 </g><g id="226">[S]</g><g id="227"> </g><g id="228">segmento.</g></mrk> </target>
   </trans-unit>
   <trans-unit id="183">
    <source><g id="228">The work done per unit mass, or </g><g id="229">specific work</g><g id="230">, is</g></source><seg-source><mrk mtype="seg" id="7"><g id="228">The work done per unit mass, or </g><g id="229">specific work</g><g id="230">, is</g></mrk></seg-source>
    <target><mrk mtype="seg" id="7"><g id="228">Para tradução </g><g id="229">segmento</g><g id="230">, é</g></mrk> </target>
   </trans-unit>
  </body>
 </file>
</xliff>

original document.xml to be updated

<w:document xmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml">
<w:body>
<w:p w:rsidR="000C0514" w:rsidRPr="004F10D0" w:rsidRDefault="004F10D0" w:rsidP="004F10D0">
<w:pPr>
<w:rPr>
<w:b/>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="004F10D0">
<w:rPr>
<w:b/>
</w:rPr>
<w:t>CHAPTER 3</w:t>
</w:r>
</w:p>
...
<w:p w:rsidR="000C0514" w:rsidRPr="009D4166" w:rsidRDefault="004F10D0" w:rsidP="004F10D0">
<w:pPr>
<w:rPr>
<w:b/>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="009D4166">
<w:rPr>
<w:b/>
</w:rPr>
<w:t>Figure 3.57</w:t>
</w:r>
</w:p>
<w:sectPr w:rsidR="000C0514" w:rsidRPr="009D4166" w:rsidSect="004F10D0">
<w:headerReference w:type="even" r:id="rId7"/>
<w:pgSz w:w="11905" w:h="16840"/>
<w:pgMar w:top="1417" w:right="1701" w:bottom="1417" w:left="1701" w:header="0" w:footer="1305" w:gutter="0"/>
<w:cols w:space="720"/>
</w:sectPr>
</w:body>
</w:document>

PHP Code

    $xmlInputFile  = 'document.xml';
    $xmlOutputFile = 'new_document.xml';
    $xmlxliff = 'segments.xliff';

    $reader = new XMLReader();
    $reader->open($xmlInputFile);

    $writer = new XMLWriter();
    $writer->openUri($xmlOutputFile);

    $iterator = new XMLWritingIteration($writer, $reader);

    $segmentos = new XMLReader();
    $segmentos->open($xmlxliff);

    $writer->startDocument();
    $t=0;
    foreach ($iterator as $node) {
        $isElement = $node->nodeType === XMLReader::ELEMENT;

        if ($isElement && $node->name === 'w:t') {
        // increase <w:t> counter and find the same g id in the xliff
        $t++;
        $writer->startElement($node->name);
            while ($segmentos->read()){
                if ($segmentos->nodeType == XMLREADER::ELEMENT && $segmentos->name === 'g'){
                $gid = $segmentos->getAttribute('id');
                if ($gid === $t){
                    $texto = $segmentos->readInnerXML();
                    $writer->text($texto);
                }
                }
            }
            $writer->endElement();
        }else {
        // handle everything else
        $iterator->write();
        }
    }
    $writer->endDocument();

And the output in new_document.xml

<?xml version="1.0"?>
<w:document xmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml">
 <w:body>
  <w:p w:rsidR="000C0514" w:rsidRPr="004F10D0" w:rsidRDefault="004F10D0" w:rsidP="004F10D0">
   <w:pPr>
    <w:rPr>
     <w:b/>
    </w:rPr>
   </w:pPr>
   <w:r w:rsidRPr="004F10D0">
    <w:rPr>
    <w:b/> 
    </w:rPr>
     <w:t/><--self closing <w:t> tag
    CHAPTER 3 <-- original text was not replaced and now is outside the tag
    </w:r>
   </w:p>
  </w:body> <-- body closing tag after first paragraph
</w:document> <-- document closing tag
<w:p w:rsidR="000C0514" w:rsidRPr="004F10D0" w:rsidRDefault="000C0514" w:rsidP="004F10D0"/> <-- more content after document closing tag
<w:p w:rsidR="004F10D0" w:rsidRDefault="004F10D0" w:rsidP="004F10D0">... 
Community
  • 1
  • 1
Ricardo Gonçalves
  • 4,344
  • 2
  • 20
  • 30
  • Are you aware of [this answer of *"PHP XMLReader read , edit Node , write XMLWriter"*](http://stackoverflow.com/a/24716074/367456) ? – hakre Mar 11 '15 at 18:44
  • Yes! I've tried the your iteractor class, and I'm using it in other parts of the code, but it's not replicating all the other nodes in this case. I didn't post a question about it because I thought it would be too specific. Should I edit it here for you or create a new question about it? – Ricardo Gonçalves Mar 11 '15 at 18:58
  • Which nodes are you missing? (I'll just compile an example with your XML as well to give it a try) – hakre Mar 11 '15 at 19:00
  • Actually I didn't run the iteractor with this example, but with a real document that has more complex structure. I'll test it with the example I posted here to see if the problem is not with my code and I'll update you. – Ricardo Gonçalves Mar 11 '15 at 19:05
  • @hakre the problem is an error with the autoload of the class: Parse error: syntax error, unexpected T_STATIC in /.../inc/classes/xmliterator/src/XMLBuild.php on line 69. – Ricardo Gonçalves Mar 12 '15 at 21:30
  • Upgrade your PHP version (this requires PHP 5.3), alternatively you could try to replace `static::` on that line with `self::`. – hakre Mar 12 '15 at 22:25
  • @hakre, I've updated the php and it's now loading the class. The problems now are that it's not updating the right tags (probably some error in y php code) and the iterator is closing the document too soon. It's still parsing the rest of the document, but the closing document tag is misplaced. – Ricardo Gonçalves Mar 13 '15 at 10:45

1 Answers1

2

First of all, there indeed is a little problem with the code. I updated XMLReaderIterator to version 0.1.8 which contains as well a little fix that is useful in your scenario.

The general problem with the flow in your example is that you don't forward the reading iterator. Therefore later on, those parts are written. This is why you see it at the end of the document. So it's not enough to write, but you also need to skip over the elements from the reading iterator you want to replace:

$writer->startElement($node->name);

$node->next();
$iterator->skipNextRead();

$writer->text(sprintf("TEXT #%d", $textCount));
$writer->endElement();

After starting the element, $node->next(); skips all subnodes (children) of the current $node element. This is necessary so that not later on these are output.

Then $iterator->skipNextRead() tells the foreach to not advance once more (already done with next(), XMLReader is forward only). This method is new for the XMLWritingIteration in v0.1.8, so you need the update.

Whole example (using your example XMLs):

require('xmlreader-iterators.php'); // require XMLReaderIterator library

$xmlInputFile = 'data/worddocument.xml';
$xmlXliffFile = 'data/segments.xliff';

$reader = new XMLReader();
$reader->open($xmlInputFile);

$writer = new XMLWriter();
$writer->openMemory();

$iterator = new XMLWritingIteration($writer, $reader);

$writer->startDocument();

$textCount = 0;
foreach ($iterator as $node) {
    $isElement = $node->nodeType === XMLReader::ELEMENT;

    if ($isElement && $node->name === 'w:t') {
        $textCount++;

        $writer->startElement($node->name);

        $node->next();
        $iterator->skipNextRead();

        $writer->text(sprintf("TEXT #%d", $textCount));
        $writer->endElement();
    } else {
        // handle everything else
        $iterator->write();
    }
}

$writer->endDocument();
echo $writer->outputMemory(true);

Output:

<?xml version="1.0"?>
<w:document xmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml">
    <w:body>
        <w:p w:rsidR="000C0514" w:rsidRPr="004F10D0" w:rsidRDefault="004F10D0" w:rsidP="004F10D0">
            <w:pPr>
                <w:rPr>
                    <w:b/>
                </w:rPr>
            </w:pPr>
            <w:r w:rsidRPr="004F10D0">
                <w:rPr>
                    <w:b/>
                </w:rPr>
                <w:t>TEXT #1</w:t>
            </w:r>
        </w:p>
        ...
        <w:p w:rsidR="000C0514" w:rsidRPr="009D4166" w:rsidRDefault="004F10D0" w:rsidP="004F10D0">
            <w:pPr>
                <w:rPr>
                    <w:b/>
                </w:rPr>
            </w:pPr>
            <w:r w:rsidRPr="009D4166">
                <w:rPr>
                    <w:b/>
                </w:rPr>
                <w:t>TEXT #2</w:t>
            </w:r>
        </w:p>
        <w:sectPr w:rsidR="000C0514" w:rsidRPr="009D4166" w:rsidSect="004F10D0">
            <w:headerReference w:type="even" r:id="rId7"/>
            <w:pgSz w:w="11905" w:h="16840"/>
            <w:pgMar w:top="1417" w:right="1701" w:bottom="1417" w:left="1701" w:header="0" w:footer="1305" w:gutter="0"/>
            <w:cols w:space="720"/>
        </w:sectPr>
    </w:body>
</w:document>

I think this is more the kind of output you're trying to achieve. If the xliff file isn't that large, it's perhaps better to not use XMLReader to parse it but SimpleXMLElement or DOMDocument. Both have XPath which should be very handy to lookup the IDs therein and gather the fitting content quickly.

hakre
  • 193,403
  • 52
  • 435
  • 836
  • It's working now. I will test now the comparisson with the other xml file to get the node with the same id and write its content in '$writer->text(sprintf("TEXT #%d", $textCount));`. Have any ideas? – Ricardo Gonçalves Mar 17 '15 at 15:37