0

Context: I'm parsing an XML file using the libxml-ruby gem. I need to query the XML document for a set of nodes using the XPath find method. I then need to process each node individually, querying them once again using the XPath find method.

Issue: When I attempt to query the returned nodes individually, the XPath find method is querying the entire document rather than just the node:

Code Example:

require 'xml'

string = %{<?xml version="1.0" encoding="iso-8859-1"?>
<bookstore>
  <book>
    <title lang="eng">Harry Potter</title>
    <price>29.99</price>
  </book>
  <book>
    <title lang="eng">Learning XML</title>
    <price>39.95</price>
  </book>
</bookstore>}

xml = XML::Parser.string(string, :encoding => XML::Encoding::ISO_8859_1).parse
books = xml.find("//book")
books.each do |book|
    price = book.find("//price").first.content
    puts price
end

This script returns 29.99 twice. I think this must have something to with setting the XPath context but I have not figured out how to accomplish that yet.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Andrew Kirk
  • 1,774
  • 1
  • 14
  • 16
  • I'd highly recommend using Nokogiri for your XML parsing. It's the defacto standard for Ruby. – the Tin Man Jun 12 '13 at 23:16
  • I actually started out using Nokogiri and ran into the same exact problem. I switched to libxml-ruby hoping that things would be different there but the same issue persists. – Andrew Kirk Jun 12 '13 at 23:18
  • Well... when the problem follows you you know it's not in the library. :-) Been there, done it too many times to remember. Stick with Nokogiri; It rocks. – the Tin Man Jun 12 '13 at 23:25

1 Answers1

2

The first problem I see is book.find("//price").

//price means "start at the top of the document and look downward. That's most certainly NOT what you want to do. Instead I think you want to look inside book for the first price.

Using Nokogiri, I'd use CSS selectors because they're more easy on the eyes and can usually accomplish the same thing:

require 'nokogiri'

string = %{<?xml version="1.0" encoding="iso-8859-1"?>
<bookstore>
  <book>
    <title lang="eng">Harry Potter</title>
    <price>29.99</price>
  </book>
  <book>
    <title lang="eng">Learning XML</title>
    <price>39.95</price>
  </book>
</bookstore>}

xml = Nokogiri::XML(string)
books = xml.search("book")
books.each do |book|
    price = book.at("price").content
    puts price
end

After running that I get:

29.99
39.95
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • Ah, you're right. I was had incorrectly assumed that would cause the query to start at the top of the node, rather than the document. The issue is resolved by simply removing the "//" in front of price. Thanks for your help! – Andrew Kirk Jun 12 '13 at 23:26
  • Correct. That's one of the reasons I prefer CSS. The slashes in XPath make my brain tired. – the Tin Man Jun 12 '13 at 23:28