Get text directly inside a tag in Nokogiri

Question

I have some HTML that looks like:

<dt>
  <a href="#">Hello</a>
  (2009)
</dt>

I already have all my HTML loaded into a variable called record. I need to parse out the year i.e. 2009 if it exists.

How can I get the text inside the dt tag but not the text inside the a tag? I've used record.search("dt").inner_text and this gives me everything.

It's a trivial question but I haven't managed to figure this out.

Note also that there are in fact two text nodes inside that `dt` (unless you parsed the HTML using the `noblanks` option): the first text node is `"\n "` before the ``, and the second text node is `"\n (2009)\n"` after it. — Phrogz, May 29 '12 at 21:44

Casper · Accepted Answer · 2012-05-29T13:01:45.453

17

To get all the direct children with text, but not any further sub-children, you can use XPath like so:

doc.xpath('//dt/text()')

Or if you wish to use search:

doc.search('dt').xpath('text()')

edited May 29 '12 at 13:01

answered May 29 '12 at 12:53

Casper

33,403
4
84
79

3

The methods above give you a NodeSet of [`XML::Text`](http://nokogiri.org/Nokogiri/XML/Text.html) nodes; you may want to use `at_xpath` (or just `at`) to get a single result, and then call the `.content` or `.text` methods on that node to get the text as a string from it. – Phrogz May 29 '12 at 21:38

score 12 · Answer 2 · answered May 29 '12 at 21:49

Using XPath to select exactly what you want (as suggested by @Casper) is the right answer.

def own_text(node)
  # Find the content of all child text nodes and join them together
  node.xpath('text()').text
end

Here's an alternative, fun answer :)

def own_text(node)
  node.clone(1).tap{ |copy| copy.element_children.remove }.text
end

Seen in action:

require 'nokogiri'
root = Nokogiri.XML('<r>hi <a>BOO</a> there</r>').root
puts root.text       #=> hi BOO there
puts own_text(root)  #=> hi  there

score 5 · Answer 3 · answered May 29 '12 at 12:46

5

The dt element has two children, so you can access it by:

doc.search("dt").children.last.text

answered May 29 '12 at 12:46

Chamnap

4,666
2
34
46

Get text directly inside a tag in Nokogiri

3 Answers3

Linked