2

How do I get the text of the first, underline, and last part of the question and store it into a variable, using Splinter?

See the HTML at the bottom. I want to make the following variables have the following values:

first_part = "Jingle bells, jingle bells, jingle all the"
second_part = "_______"
third_part = "! Oh what fun it is to ride in one-horse open sleigh!"

I went here, used the XPATHs

//*[@id="question_container"]/div[1]/span/text()[1] #this is first_part
//*[@id="question_container"]/div[1]/span/span      #this is second_part
//*[@id="question_container"]/div[1]/span/text()[2] #this is third_part

and applied them to the below HTML. They returned the wanted values in the test, but for my program, Splinter seems to reject them:

first_part = browser.find_by_xpath(xpath = '//*[@id="question_container"]/div[1]/span/text()[1]').text
second_part = browser.find_by_xpath(xpath = '//*[@id="question_container"]/div[1]/span/span').text
third_part = browser.find_by_xpath(xpath = '//*[@id="question_container"]/div[1]/span/text()[2]').text

print first_part
print second_part
print third_part

--------------    OUTPUT     -------------

[]
[]
[]

What am I doing wrong, why is it wrong, and how should I change my code?

The referred-to HTML (which was slightly edited to 'Jingle Bells' to better convey the problem) was retrieved using the browser.html feature of Splinter:

<div id="question_container" style="display: block;">
<div class="question_wrap">

<span class="question">Jingle bells, jingle bells, jingle all the
<span class="underline" style="display: none;">_______</span>
<input type="text" name="vocab_answer" class="answer" id="vocab_answer"></input>
! Oh what fun it is to ride in one-horse open sleigh!</span>

</div></div>
actinidia
  • 236
  • 3
  • 17

1 Answers1

2

The xpath passed to the find_by_xpath() method has to point/result to an element, not a text node.

One option would be to find the outer span, get it's html and feed it to lxml.html:

from lxml.html import fromstring

element = browser.find_by_xpath(xpath='//div[@id="question_container"]//span[@class="question"]')

root = fromstring(element.html)
first_part = root.xpath('./text()[1]')[0]
second_part = root.xpath('./span/text()')[0]
third_part = root.xpath('./text()[last()]')[0]

print first_part, second_part, third_part

Prints:

Jingle bells, jingle bells, jingle all the
_______ 
! Oh what fun it is to ride in one-horse open sleigh!
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • What do I use instead of `find_by_xpath()`? I can't find another relevant method in Splinter's documentation. – actinidia Dec 13 '14 at 03:11
  • @Princee you should find the `span` with `class="question"` first. Then, you can get the parts of the text, there are certainly multiple options. Can you provide a link to the web site for me to test? Thanks. – alecxe Dec 13 '14 at 03:12
  • @Princee thanks, please try out the solution in the updated answer. – alecxe Dec 13 '14 at 03:34
  • It works! Thank you! I have a question, though. Doesn't `find_by_xpath()` return an element object? How is such a variable type compatible with the `.html` method? – actinidia Dec 13 '14 at 03:58
  • @Princee glad it worked. `.html` attribute of an element returns an HTML code of an element which is what we are feeding to `lxml.html`. – alecxe Dec 13 '14 at 04:00