0

I want to find a element like this.

xml1 = '<period>2017-02-10</period>'

or

xml2 = <<XML
<root xmlns:xbrli="http://www.w3.org/1999/xhtml">
  <xbrli:period>2017-02-10</period>
</root>
XML

I can select the element by:

  def period_from_xml(xml)
    doc = Nokogiri::XML(xml)
    period_element = if doc.namespaces.keys.include?('xmlns:xbrli')
      doc.at_css("xbrli|period")
    else
      doc.at_css("period")
    end
  end

  period_from_xml(xml1)
  # => <period>2017-02-10</period>
  period_from_xml(xml2)
  # => <xbrli:period>2017-02-10</period>

I know Nokogiri::XML::Document#remove_namespaces!, but I don't want to use it, because another place I need it.

Maybe duplicating the doc and doc_without_namespaces is good idea?

Is there a easy and simple way to handle this situation?

ironsand
  • 14,329
  • 17
  • 83
  • 176
  • Please read "[mcve]". Your input XML sample needs to be better as it's missing the namespace declarations you're trying to find. – the Tin Man Feb 13 '17 at 22:17

1 Answers1

0

I'd use this:

require 'nokogiri'

xml = <<EOT
<root xmlns:xbrli="http://www.w3.org/1999/xhtml">
  <period>2017-02-10</period>
  <xbrli:period>2017-02-11</period>
</root>
EOT

doc = Nokogiri::XML(xml)

doc.search('period,xbrli|period').map(&:text) # => ["2017-02-10", "2017-02-11"]

'period,xbrli|period' in CSS means "find "period" or "xbrli:period".

See "How to avoid joining all text from Nodes when scraping" also.

Community
  • 1
  • 1
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • Sorry, my first question had not enough information about what I want to know. I edited my question. – ironsand Feb 17 '17 at 03:26