-2

How can I remove all comments from a large XML file?

I have a large file XML and I want to thin it and I want to remove all the comments. The file has a size of over 200 mb and it takes a lot to parse the file and query something.

Code for parse is :

<?php

$dom    = new DOMDocument();
$xpath  = new DOMXPath($dom);
$reader = new XMLReader();
$reader->open('http://www.bookingassist.ro/test/HotelsPro.xml');

while ($reader->read()) {
    if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'Hotel') {
        $node = $dom->importNode($reader->expand(), true);
        $dom->appendChild($node);
        $result = $xpath->evaluate('string(self::Hotel[HotelCode = "'.$hotelCodes[3].'"]/HotelImages/ImageURL[1])', $node);
        $dom->removeChild($node);
        if ($result) {
            echo $result;

        }
    }
}
?>
ThW
  • 19,120
  • 3
  • 22
  • 44
Razvan Baba
  • 155
  • 9
  • 2
    What is your technology? e.g. .NET, XSLT, or what? – fly_ua Dec 29 '14 at 12:43
  • 200MB is not a big XML file... and removing comments won't reduce parse time much unless it's mostly comments. You need to look at your parsing code - is it a SAX parser or a DOM (i.e. you're reading the entire lot into memory) – Ben Dec 29 '14 at 12:45

1 Answers1

0

Assuming Xslt is an option, you can use a modified version of the identity transform which will project nothing for any matched comment:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="comment()"/>

</xsl:stylesheet>

Fiddle here

StuartLC
  • 104,537
  • 17
  • 209
  • 285