1

I am trying to put the values of some xml elements into an array using rexml. Here is an example of what I am doing:

doc = Document.new("<data><title>This is one title</title><title>This is another title</title></data>")
XPath.each( doc, "*/title") { |element| 
    puts element.text
}

However, that outputs:

[<title> ... </>, <title> ... </>] 

How can I get it to output an array containing "This is one title" and "This is another title"?

Jarred
  • 1,986
  • 5
  • 27
  • 42
  • 2
    While `puts` may convert its argument its argument to a string anyway, you can have the XPath return the text node in the first place: `XPath.each(doc, "*/title/text()") {...` – LarsH Nov 17 '11 at 23:41
  • 1
    That was it. I had to call the text() method inside of each. Thank you! – Jarred Nov 17 '11 at 23:47
  • 1
    @LarsH: Sorry Lars, I only saw your comment after I posted my answer :( Please post an answer and I'll delete mine. – Dimitre Novatchev Nov 18 '11 at 04:46
  • 1
    @Dimitre: ok, will do. The reason I didn't post an answer in the first place is because I thought, based on lwburk's answer, that the problem was already solved. – LarsH Nov 18 '11 at 15:06
  • @Jarred, glad your problem is solved. I converted my comment to an answer... so if you want to upvote it, there it is. – LarsH Nov 18 '11 at 15:47

2 Answers2

4

Moving my comment to an answer, per request:

While puts may convert its argument its argument to a string anyway, you can have the XPath return the text node in the first place:

XPath.each(doc, "*/title/text()") {...
LarsH
  • 27,481
  • 8
  • 94
  • 152
  • Thanks, @Dimitre. I still think lwburk's answer is excellent, and he did more work than I did: he tested the OP's code, found the behavior was as desired, and brought the results back to the OP to check that the OP wasn't just forgetting something obvious. – LarsH Nov 18 '11 at 15:45
  • @Dimitre, @LarsH - It's not just that `puts` is converting its output to a string, it's that `element.text` and selecting the `text()` in the first place are equivalent (given the OP's input). – Wayne Nov 18 '11 at 16:13
  • @lwburk: ok. Dimitre - this is part of why I prefer lwburk's answer... he knows something about the Ruby XPath API, and I don't. :-) – LarsH Nov 18 '11 at 17:00
  • I still wonder how in the world the OP got the output he's getting, with the code he showed. – LarsH Nov 18 '11 at 17:26
  • 2
    @lwburk, LarsH, OK guys, +1 for lwburk, too. I am interested primarily how using XPath can minimize the need for programming constructs in the host PL. – Dimitre Novatchev Nov 18 '11 at 17:27
  • @lwburk: How did you manage to cheat the system and have more than one users referred to in your comment? I'd ask this as a SO question but I know it would be deleted in no time ... :) – Dimitre Novatchev Nov 18 '11 at 17:29
  • @DimitreNovatchev - I just listed them independently, each with an `@`. I naively assumed it would work and I guess it did :) Also, +1 for *I am interested primarily how using XPath can minimize the need for programming constructs in the host PL.* I generally agree. – Wayne Nov 18 '11 at 17:54
  • @lwburk: It doesn't work for me -- I get a message that only one user reference / comment is allowed :( – Dimitre Novatchev Nov 18 '11 at 18:00
  • @Dimitre I wondered the same thing, how @lwburk referred to both of us with `@`. Then I decided the key was in the error message that said only one *additional* user; the author of the answer is automatically notified. However I just now tried putting both of you in this comment and it was accepted w/o error! Have the rules changed? – LarsH Nov 18 '11 at 20:12
  • @LarsH: Yes, I also noticed that sometimes this restriction isn't enforced. Probably SO bug :) – Dimitre Novatchev Nov 19 '11 at 01:12
3

Are you sure about that? Here's a complete program:

#!/usr/bin/ruby

require 'rexml/document'
include REXML

doc = Document.new("<data><title>This is one title</title><title>This is another title</title></data>")
XPath.each( doc, "*/title") { |element|
    puts element.text
}

Output:

This is one title
This is another title

Edit: It sounds like the OP has moved on, but I think there should be some clarification added here for future visitors. I upvoted @LarsH's good answer, but it should be noted that, given the OP's specific input, element.text should produce exactly the same output as would result from selecting the text() nodes in the first place. From the docs:

text( path = nil ) A convenience method which returns the String value of the first child text element, if one exists, and nil otherwise.

The sample input given in the original question shows <title> elements containing only one text node in each case. Therefore, these two methods are the same (in this case).

However, pay attention to this important note:

Note that an element may have multiple Text elements, perhaps separated by other children. Be aware that this method only returns the first Text node.

You can get all of an element's child text nodes using texts() (plural).

What I suspect a lot of people are really looking for is an equivalent of the DOM's textContent (or its illegitimate cousin innerText). Here's how you might do that in Ruby:

XPath.each(doc, "*/title") { |el|
    puts XPath.match(el,'.//text()').join
}

This joins all of the text descendants of each element into a single string.

The short answer is that there's no short answer. Which one of these you want, if any, is highly context-specific. The only requirement in the original question is to "put the values of some xml elements into an array", which isn't really much of a specification.

Wayne
  • 59,728
  • 15
  • 131
  • 126
  • Yes, I am sure that the output I posted above is what I am getting. Is there any kind of configuration that could be causing this different output? – Jarred Nov 17 '11 at 23:40
  • 1
    @Jarred, if it still matters, you could post more of your code, e.g. your require and include statements. But you probably already checked lwburk's and they are the same? – LarsH Nov 18 '11 at 15:46