23

I'm using Nokogiri::XML to parse responses from Amazon SimpleDB. The response is something like:

<SelectResponse xmlns="http://sdb.amazonaws.com/doc/2007-11-07/">
  <SelectResult>
    <Item>
      <Attribute><Name>Foo</Name><Value>42</Value></Attribute>
      <Attribute><Name>Bar</Name><Value>XYZ</Value></Attribute>
    </Item>
  </SelectResult>
</SelectResponse>

If I just hand the response straight over to Nokogiri, all XPath queries (e.g. doc/"//Item/Attribute[Name='Foo']/Value") return an empty array. But if I remove the xmlns attribute from the SelectResponse tag, it works perfectly.

Is there some extra thing I need to do to account for the namespace declaration? This workaround feels horribly like a hack.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Mark Rendle
  • 9,274
  • 1
  • 32
  • 58

2 Answers2

32

That XPath query looks for elements that are not in any namespace. You need to tell your XPath processor that you are looking for elements in the http://sdb.amazonaws.com/doc/2007-11-07/ namespace.

One way to do that with Nokogiri is:

doc = Nokogiri::XML.parse(...)
doc.xpath("//aws:Item/aws:Attribute[Name='Foo']/aws:Value", {"aws" => "http://sdb.amazonaws.com/doc/2007-11-07/"})
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
hrnt
  • 9,882
  • 2
  • 31
  • 38
20

I found "Namespaces in XML" really helpful in understanding what's going on.

Basically if you have a namespace defined via xmlns=, you must use a namespace in your XPath searches.

So in your case, you could do one of three things:

  • Remove the xmlns attribute from the root SearchResponse. In that case your original, namespace-less XPath query will work.

  • Use the default namespace in your XPath query:

    doc/"//xmlns:Item/xmlns:Attribute[xmlns:Name='Foo']/xmlns:Value"
    
  • Define a custom namespace in the second argument of the xpath method and use that in your query, as shown in hrnt's solution above.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Matt Zukowski
  • 4,469
  • 4
  • 37
  • 38
  • 7
    There's a `remove_namespaces!` method [documented here](http://nokogiri.org/Nokogiri/XML/Document.html#method-i-remove_namespaces%21). – RobinGower Oct 04 '11 at 22:17
  • 1
    @RobinGower Yes, and it says `For more information on why this probably is not a good thing in general, please direct your browser to` [tenderlovemaking.com/2009/04/23/namespaces-in-xml/](http://tenderlovemaking.com/2009/04/23/namespaces-in-xml) – nurettin Jul 16 '12 at 12:14
  • Both the links in the comments are outdated. Here's an updated doc link for [remove_namespaces!](http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/Document:remove_namespaces!) – Jason Jul 29 '15 at 17:38