5

I have this simple HTML:

<div> Test <span> someting </span></div>

How can I retrieve only the innertext of the div?

Using text retrieves all text from the div:

[1] pry(#<SandBox>)> first(:xpath, '//div').text
=> "Test someting"

Using text() in my XPath query results in the following error:

[2] pry(#<SandBox>)> first(:xpath, '//div/text()')
Capybara::Poltergeist::BrowserError: There was an error inside the PhantomJS portion of Poltergeist. This is probably a bug, so please report it. 
TypeError: 'null' is not an object (evaluating 'window.getComputedStyle(element).display')

However, using the same XPath with Nokogiri works:

[3] pry(#<SandBox>)> Nokogiri::HTML(page.html).xpath('//div/text()').text
=> " Test "

Is there a way to do it using only capybara without resorting to Nokogiri?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
egwspiti
  • 957
  • 5
  • 10
  • as the error message reports, this is very much likely a bug. You should report this to the developers. There is really no way around using `text()` and if this doesn't work I would consider it a major bug. So you either wait for the fix or you use another solution like Nokogiri – dirkk Apr 03 '14 at 13:20
  • I too searched and it seems there is no way to get the inner HTML at this time. I guess we'll have to wait for an update or make a pull request. I think at least one other Capybara driver supports the innerHTML method. – B Seven Sep 29 '14 at 22:53

1 Answers1

0

You can always use Nokogiri and open-uri.

require 'nokogiri'
require 'open-uri'

2.2.0 :021 > html = Nokogiri::HTML::DocumentFragment.parse('<div> Test <span> someting     </span></div>').child

 => #<Nokogiri::XML::Element:0x44a7082 name="div" children=[#<Nokogiri::XML::Text:0x44a63ee " Test ">, #<Nokogiri::XML::Element:0x44a62e0 name="span" children=[#<Nokogiri::XML::Text:0x44a3f04 " someting ">]>]> 

Then you can perform operations on it depending on what you want to grab. So for the text inside the tags:

2.2.0 :072 > html.children.first

 => #<Nokogiri::XML::Text:0x45ea37c " Test "> 

2.2.0 :073 > html.children.first.text

=> " Test " 

or

2.2.0 :215 > html.children.first.content

 => " Test "

Good luck!

Drew B
  • 97
  • 1
  • 8
  • I should mention, the reason you use #child on :021 is so that you are working with a Nokogiri::XML::Element. The Element class inherits from the Node class. – Drew B Jan 12 '15 at 17:45