2

I am using Ruby version 1.9.3. Here is a simple version of the actual XML page that I want to get information from. I need to access it from a secure website which requires login credentials. I can't use Nokogiri because I wasn't able to log into the website using it.

<root>
  <person>
    <name>Jack</name>
    <age>10</age>
  </person>
  <person>
    <name>Jones</name>
  </person>
  <person>
    <name>Jon</name>
    <age>16</age>
  </person>
</root>

As you can see sometimes the tag age does not appear. Using REXML with Ruby, I use the following code:

agent = Mechanize.new
xml = agent.get("https://securewebsite.com/page.xml")
document = REXML::Document.new(xml.body)

name = XPath.match(document, "//person/name").map {|x| x.text} 
# => ["Jack", "Jones", "Jon"]

age =  XPath.match(document, "//person/age").map {|x| x.text} 
# => ["10", "16"]

The problem is that I can't associate the age with the correct name because the index are now out of order. For example at index 1, name[1] is Jones but age[1] is 16. But that is not true because the person tag for Jones does not have the age tag.

Is there any way that I can get the age array to output: # => ["10", nil ,"16"] so that I can associate the correct name with its corresponding age?

Or is there a better way? Let me know if further explanation is required.

Redson
  • 2,098
  • 4
  • 25
  • 50
  • if you are getting the xml, can't you just pass that into nokogiri directly at that point by using `Nokogiri::XML(xml)`? – Alexis Andersen Feb 03 '15 at 17:06
  • @DaneAndersen What do I do once I pass it to nokogiri? – Redson Feb 03 '15 at 17:07
  • Nokogiri doesn't have any idea what logging into a website means. It's a *parser*. Perhaps you should use something like Mechanize if you want to log into a site, and then use the embedded Nokogiri document that Mechanize uses. For that matter, REXML won't know how to log into a site either. – the Tin Man Feb 03 '15 at 17:09
  • Gotcha, sorry, I thought you were saying you would have known how to do it with Nokogiri, but not with this library. – Alexis Andersen Feb 03 '15 at 17:09
  • @theTinMan I am using Mechanize. I will update the code above – Redson Feb 03 '15 at 17:11
  • Thank you. Your question is confused and confusing. We need the minimum code necessary to duplicate the problem. It has to be syntactically correct. The XML, and how you want to use it isn't well defined. Is `Jones` associated with `Jack` and really should be the last name, or is that an entirely separate person with no age? – the Tin Man Feb 03 '15 at 17:12

2 Answers2

5

The problem is that we are looking at age and name as completely separate collections of information. What we need to do is get information from person as a collection.

xml = "<your xml here />"
doc = Nokogiri::XML(xml)
persons = doc.xpath("//person")
persons_data = persons.map {|person| 
  {
    name: person.xpath("./name").text,
    age: person.xpath("./age").text
  }
}

This gets the person nodes and then gets the related information from them giving a result:

puts persons_data.inspect #=> [
                                {:name=>"Jack", :age=>"10"}, 
                                {:name=>"Jones", :age=>""}, 
                                {:name=>"Jon", :age=>"16"}
                              ]

So to get the name and age of the first person you would call

persons_data[0]["name"] #=> "Jack"
persons_data[0]["age"]  #=> "10"
Alexis Andersen
  • 785
  • 6
  • 12
  • How do I get an output from this? – Redson Feb 03 '15 at 17:38
  • @Alias The map returns the set of data. I refactored the code to make this more obvious. – Alexis Andersen Feb 03 '15 at 17:43
  • How do I access the name and age for a given index? So how can I get the name and age for `persons_data[0]`? – Redson Feb 03 '15 at 17:48
  • You can return the data however you see the most value. I use hashes like this because I think it exposes the meaning of the information better. For instance, you could do `persons.map {|person| [name: person.xpath("./name").text, person.xpath("./age").text] }` instead and you would get `[["Jack","10"],["Jones",""],["Jon","16"]]` as your output. But then the meaning of the name and the age are hidden in your code. If I wanted to know the first age, I would have to call `persons_data[0][1]` vs for the hash, I would get the same data by asking `persons_data[0]["age"]` – Alexis Andersen Feb 03 '15 at 17:50
  • Just to be clear, you are telling me that I have to access name and age by using string parsing on `persons_data` given an index? – Redson Feb 03 '15 at 17:57
  • I tried `persons_data[0]["name"]` and it keep giving me `nil` or nothing instead of `Jack`. It is because of the version of ruby I am using (1.9.3)? – Redson Feb 03 '15 at 18:36
  • Wow, sorry, that was a fail on my side. It should be `persons_data[0][:name]` as the keys are symbols, not strings. – Alexis Andersen Feb 03 '15 at 19:03
  • Be careful using `xpath(...).text` as it isn't doing what you think it is. `xpath` returns a NodeSet, and `NodeSet.text` returns ALL the text nodes found inside the Node entries in the set. That can be quite different than applying `text` to a single node. Instead, use either `at` or `at_xpath` to retrieve just the desired node. – the Tin Man Feb 03 '15 at 20:24
1

I'd do something like this:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<root>
  <person>
    <name>Jack</name>
    <age>10</age>
  </person>
  <person>
    <name>Jones</name>
  </person>
  <person>
    <name>Jon</name>
    <age>16</age>
  </person>
</root>
EOT

people = doc.search('person').each_with_object({}){ |person, h|
  age = person.at('age')
  h[person.at('name').text] = age ? age.text : nil
}

people # => {"Jack"=>"10", "Jones"=>nil, "Jon"=>"16"}

At that point, if I only want the ages, I'd use values:

people.values # => ["10", nil, "16"]

Retrieving a single person's age is trivial then:

people['Jon'] # => "16"
people['Jack'] # => "10"

I get this error when I'm using the .to_h method: ``block in ': undefined method to_h'

My mistake. to_h is not in older Rubies, but it's not needed because of how I'm generating the hash being returned. I adjusted the code above which will work in any Ruby that implements each_with_object.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303