3

I'm using Ruby's Hpricot gem to parse html. I'd like to remove a single node from the document for use elsewhere, but I can't find a way.

I see that I can remove an entire list of elements, using an instance of Hpricot::Elements (x = (doc/"div").remove), but I only want to remove the first instance of a given tag.

Poking around, I see the suggestion that I simply replace the element's inner text with a comment node or whitespace (x.inner_html = ''), but that prevents me making use of the node elsewhere.

What can I do?

Specs: Ruby 1.8.7, Hpricot 0.8.4

JellicleCat
  • 28,480
  • 24
  • 109
  • 162
  • Out of curiosity, must you use Hpricot? Answers involving [Nokogiri](http://nokogiri.org) would likely be easier to come by (you just append the node elsewhere in the document and it is removed from its previous location) and would likely be both more robust and efficient. – Phrogz Jun 25 '12 at 17:00
  • I started out using Nokogiri, but I'm working on someone's legacy code with ` – JellicleCat Jun 25 '12 at 17:22
  • 1
    Oh? `doc = Nokogiri::HTML(''); puts Nokogiri::HTML::DocumentFragment.new(doc,'Hello') #=> Hello ` – Phrogz Jun 25 '12 at 17:26
  • +1. I don't know how I got my bad results in the past. – JellicleCat Jun 25 '12 at 18:41

1 Answers1

3

Try this!

x = (doc/"div").first
x.parent.children.delete(x) unless x.nil?
Cameron C
  • 366
  • 3
  • 15