1

I have this XML file:

   <page>
      <title>test</title>
      <text>bla bla</text>
   </page>
   <page>
      <title>another test</title>
      <text>bla bla</text>
    </page>
    <page>
      <title>hello</title>
      <text>hello world</text>
    </page>

I want to parse the file (PHP SAX parser) to find pages with the Title "hello", and then save the corresponding Text tag content.. what I did so far is this:

   $pages = array();
   $elements  = null;

   function startElements($parser, $name, $attrs) {
      global $wiki, $elements;

      if(!empty($name)) {
        $elements = $name;
      }
   }

   function endElements($parser, $name) {
      global $elements;

      if(!empty($name)) {
         $elements = null;
      }
   }

   function characterData($parser, $data) {
      global $pages, $elements;

      if(!empty($data)) {
         if ($elements == 'TITLE' ) {
            if((preg_match('/Hello/', $data)==1))
             { 

                 // ... I found the page with the good title, but how to get the following text tag content!!

             }
         }
      }
   }

   $parser = xml_parser_create(); 

   xml_set_element_handler($parser, "startElements", "endElements");
   xml_set_character_data_handler($parser, "characterData");

  if (!($handle = fopen('tmp.xml', "r"))) {
      die("could not open XML input");
   }

   while($data = fread($handle, 4096)) {
      xml_parse($parser, $data);  
   }

Any ideas on how to get the content of the Text tag with specific Titles tag? I could get the result I need by saving all the data in an array.. and then to the search !! but I'd like a better solution.

thank you.

lady_OC
  • 417
  • 1
  • 5
  • 20

1 Answers1

1

Ok I found a solution, not based on the SAX parser like I wanted but still adapted to large files. It's by combining SimpleXML(DOM parser) and XMLReader (stream-based parser). SimpleXML allows easy access to child nodes.

With XMLReader, the data is passed one element at a time using expand(). With this method, you can convert a node passed by XMLReader to a DOMElement, and then to SimpleXML.

Details for combining both can be found here: http://www.ibm.com/developerworks/library/x-xmlphp2/

I hope this helps someone else.

lady_OC
  • 417
  • 1
  • 5
  • 20