0

I'm trying to parse the following XML to extract out the Lat Long combination under //ns2:Point/ns2:pos using Nokogiri XML parser but without much luck.

<?xml version="1.0" encoding="UTF-8"?>
<ns1:XLS ns1:lang="en" rel="5.2.sp03" version="1.0" xmlns:ns1="http://www.opengis.net/xls">
    <ns1:ResponseHeader sessionID="wrx-rails1370997540"/>
    <ns1:Response numberOfResponses="1" requestID="10" version="1.0">
        <ns1:GeocodeResponse>
            <ns1:GeocodeResponseList numberOfGeocodedAddresses="1">
                <ns1:GeocodedAddress>
                    <ns2:Point xmlns:ns2="http://www.opengis.net/gml">
                        <ns2:pos>38.898331 -77.117273</ns2:pos>
                    </ns2:Point>
                    <ns1:Address countryCode="US">
                        <ns1:StreetAddress>
                            <ns1:Building number="4400"/>
                            <ns1:Street>Lee Hwy</ns1:Street>
                        </ns1:StreetAddress>
                        <ns1:Place type="CountrySubdivision">VA</ns1:Place>
                        <ns1:Place type="CountrySecondarySubdivision">Arlington</ns1:Place>
                        <ns1:Place type="MunicipalitySubdivision">Arlington</ns1:Place>
                        <ns1:PostalCode>22207</ns1:PostalCode>
                    </ns1:Address>
                    <ns1:GeocodeMatchCode accuracy="1.0" matchType="ADDRESS POINT LOOKUP"/>
                    <ns1:SpatialKeys>
                        <ns1:SpatialKey priority="0" val="1663355010"/>
                        <ns1:SpatialKey priority="1" val="2563322400"/>
                        <ns1:SpatialKey priority="2" val="3325185160"/>
                        <ns1:SpatialKey priority="3" val="3784086306"/>
                        <ns1:SpatialKey priority="4" val="4033029320"/>
                        <ns1:SpatialKey priority="5" val="4162373938"/>
                        <ns1:SpatialKey priority="6" val="4228264524"/>
                        <ns1:SpatialKey priority="7" val="4261514387"/>
                        <ns1:SpatialKey priority="8" val="4278215460"/>
                        <ns1:SpatialKey priority="9" val="4286585033"/>
                        <ns1:SpatialKey priority="10" val="4290774578"/>
                        <ns1:SpatialKey priority="11" val="4292870540"/>
                        <ns1:SpatialKey priority="12" val="4293918819"/>
                        <ns1:SpatialKey priority="13" val="4294443032"/>
                        <ns1:SpatialKey priority="14" val="4294705158"/>
                        <ns1:SpatialKey priority="15" val="4294836224"/>
                    </ns1:SpatialKeys>
                </ns1:GeocodedAddress>
            </ns1:GeocodeResponseList>
        </ns1:GeocodeResponse>
    </ns1:Response>
</ns1:XLS>

I get back an empty array when i try the following:

doc = Nokogiri::XML(response.body);
pos = doc.xpath('//ns2:Point/ns2:pos');

I can access Geocoded address element however just fine using:

doc.xpath('//ns1:GeocodeResponseList/ns1:GeocodedAddress')

Any clues as to what i'm missing here. Is it the namespace changing which it doesn't like for some reason?

My Environment is as follows: Nokogiri 1.5.9 Java Rails 3.2.11 jRuby 1.7.4 Windows 7 Box

the Tin Man
  • 158,662
  • 42
  • 215
  • 303

1 Answers1

0

You can find the first expression because Nokogiri found the XML namespace where it expected one. The ns2 namespace isn't where we'd normally find it so Nokogiri doesn't know what to do.

There are multiple ways to deal with this. The first is to gather the namespaces in the document and pass them to Nokogiri when you do your search. Nokogiri does this automatically for namespaces in the XML root, but not if they're sprinkled throughout the document, so we have to tell it to search everywhere, then pass them in:

namespaces = doc.collect_namespaces
namespaces # => {"xmlns:ns1"=>"http://www.opengis.net/xls", "xmlns:ns2"=>"http://www.opengis.net/gml"}
pos = doc.xpath('//ns2:Point/ns2:pos', namespaces);
pos # => [#<Nokogiri::XML::Element:0x3fe8c608ab30 name="pos" namespace=#<Nokogiri::XML::Namespace:0x3fe8c608aacc prefix="ns2" href="http://www.opengis.net/gml"> children=[#<Nokogiri::XML::Text:0x3fe8c608e1b8 "38.898331 -77.117273">]>]

An alternate is to tell Nokogiri to remove all namespaces from the document. You only want to do that if you're sure there are no collisions between tag names found in the various namespaces in the document:

doc.remove_namespaces!
pos = doc.xpath('//Point/pos', namespaces);
pos # => [#<Nokogiri::XML::Element:0x3fe8c608ab30 name="pos" children=[#<Nokogiri::XML::Text:0x3fe8c608e1b8 "38.898331 -77.117273">]>]

The Nokogiri documentation has this to say about the use of remove_namespaces!:

But I’m Lazy and Don’t Want to Deal With Namespaces!

Lazy == Efficient, so no judgements. :)

If you have an XML document with namespaces, but would prefer to ignore them entirely (and query as if Tim Bray had never invented them), then you can call remove_namespaces on an XML::Document to remove all namespaces. Of course, if the document had nodes with the same names but different namespaces, they will now be ambiguous. But you’re lazy! You don’t care!

Community
  • 1
  • 1
the Tin Man
  • 158,662
  • 42
  • 215
  • 303