0

Is there any similar method like (accessions = doc.at_xpath('//Node/Childtag').content) for Nokogiri::XML::SAX::Document?

I have XML like:

<accession>Police-1234</accession>
<accession>Police-6574</accession>    
<police>
    <privateCar>
      <fullName>BMW 750Li</fullName>
    </privateCar>
    <officeCar>
        <fullName>Ford Mustang GT</fullName>
    </officeCar>
    <optional>
       <fullName>Porsche carrera 511</fullName>
    </optional>
    </police>

My code is some what like:

require 'rubygems'
require 'nokogiri'

include Nokogiri

class PostCallbacks < XML::SAX::Document


  def initialize
     @in_title = false
   @in_title2 = false
    end

  def start_element(element, attributes)
  @attrs = attributes
  @content = ''
  @in_title = element.eql?("accession")
  # Collecting all the other nodes/tags
  @in_title2 = element.eql?("fullName")
  end



  def end_document
       # puts "Here is where the attributes could be played with"
  end


  def characters string

    string.strip!
    if @in_title and !string.empty?
          puts "Accession: #{string}"

    elsif @in_title2 and !string.empty?
          puts "Full Name: #{string}"
    end

    @content << string if @content

  end

end


parser = XML::SAX::Parser.new(PostCallbacks.new)
parser.parse(File.open(ARGV[0]))

My results are:

Accessions:Police-1234
Accessions:Police-6574

Full Name: BMW 750Li
Full Name: Ford Mustang GT
Full Name: Porsche carrera 511

Now I have two questions.

  1. How do I only restrict collecting the "accession" element with value "Police-1234".
  2. I want to only retrieve the full name of the privateCar's child. i.e I want only BMW 750Li as my result.

For the first point, I generally use doc.xpath(//accession).first to pull out the first entry in the XML.

For the second point, I know I can select it using XPath with doc.at_xpath(//police/privateCar/fullName), but is there something similar for the SAX parser?

I am using SAX since I have a large XML file to be parsed.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
A1aks
  • 187
  • 1
  • 2
  • 15

1 Answers1

2

The short answer is no, there is no similar functionality in SAX.

You're not understanding the difference between SAX parsing and DOM parsing. Normally, when we use Nokogiri, we are working with documents that are small enough to fit into memory and be parsed into a DOM ("document object model). That has huge advantages as far as being able to iterate over the document and search it because we can rewind and search from the top of the document as often as we want without penalty. And, because it's all in memory, it's easy for us to tell the parser to find a particular node based on a string of nodes; It's all there for the parser to walk through, finding the particular landmarks we've specified.

SAX ("Simple API for XML") processing occurs serially, from the top of the document's stream, through to its end, and, as each tag is opened or closed, we are given the opportunity to do something with its parameters. Instead of searching using an XPath or CSS selector, we have to look for the tag's name as we get tag open events, and set flags to remember that we've seen it, then look for the subsequent tag names as they're opened, until we get to the desired content.

SAX is an entirely different way to process a document, but its advantage is it's much more memory efficient.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303