-1

I am trying to use Nokogiri to parse my XML which is I am getting from an URL, but I am not able to create an array of it so that it would be accessible all over the project.

My XML:

<component name="Hero">
    <topic name="i1">
      <subtopic name="">
          <links>
            <link Dur="" Id="" type="article">
                <label>I am here First. </label>

    <topic name="i2">
      <subtopic name="">
          <links>
            <link Dur="" Id="" type="article">
                <label>I am here Fourth. </label>
                <label>I am here Sixth. </label>
    <topic name="i3">
      <subtopic name="">
          <links>
            <link Dur="" Id="" type="article">
                <label>I am here Fourth. </label>

I am planning to create an array for each topic, which would contains labels inside it. For example:

hro_array = ["I am here First.","I am here Second.","I am here Third".]

born2Learn
  • 1,253
  • 5
  • 14
  • 25
  • Welcome to Stack Overflow. Please create a minimal example of the XML that is syntactically correct. Anything beyond minimal or that doesn't work only slows our ability to help you, or worse, keeps us from helping you. What code have you written to solve this? It's easier for us to fix it than it is to write something from scratch and explain how to use it. – the Tin Man May 06 '15 at 19:01
  • What Nokogiri code do you have so far? – tadman May 06 '15 at 19:03
  • `require 'open-uri' doc = Nokogiri::HTML(open("url")) ` – born2Learn May 06 '15 at 19:04

1 Answers1

1

Assuming your XML is well formed and valid (proper closing of nested tags, etc.) then you simply need to fetch the contents of the URL (e.g. using the builtin open-uri) and then use an XML parsing technique (e.g. XPath) to retrieve the desired data.

For example, assuming you want a hash of topic name to a list of nested labels:

require 'open-uri'
require 'nokogiri'

def topic_label_hash(doc)
  doc.xpath('//topic').each_with_object({}) do |topic, hash|
    labels = topic.xpath('.//label/text()').map(&:to_s)
    name = topic.attr('name')
    hash[name] = labels
  end
end

xml = open(my_url)
doc = Nokogiri::XML(xml)
topic_label_hash(doc) # =>
# {
#   "TV" => [
#     "I am here First. ",
#     "I am here Second. ",
#     "I am here Third. ",
#     ...
#   ],
#   "Internet" => [
#     "I am here Fourth. ",
#     "I am here Fifth. ",
#     "I am here Sixth. "
#   ],
#   ...
# }
maerics
  • 151,642
  • 46
  • 269
  • 291
  • awesome, Its creating the array of hashes. Now I am trying to how can I use it.acccess the individual array from the created hash. ` full_hash = topic_label_hash(doc)` '`full_hash["TV"]` = to access only tv array ``full_hash["TV"] [0]` to access an element – born2Learn May 06 '15 at 19:26
  • Thats really awesoem help @maerics, I really appreciate your help. Thanks a ton maerics. – born2Learn May 06 '15 at 19:29
  • @beginner_yaml_user: yup, you got it. Glad it helped you out. – maerics May 06 '15 at 19:31