RSelenium: Can't extract piece of text from table

Question

Have a look at this webpage. I want to extract the text element '2013'. I use RSelenium for this, but if anyone knows how to do it using any other package that is fine too. My current script is the following:

startServer()
remDr <- remoteDriver(browserName="chrome")
remDr$open(silent=T)
remDr$navigate(as.character(url))
remDr$findElement("css selector","#crosstable > table > tbody > tr:nth-child(2) > th:nth-child(2)")$getElementText()

This gives following error:

Error:   Summary: NoSuchElement
Detail: An element could not be located on the page using the given search parameters.
class: org.openqa.selenium.NoSuchElementException

The first thing I noted was that it is not possible to select this short piece of text using selectorgadget. So I want looking for the piece of text in the source code and copied its specific selector path: #crosstable > table > tbody > tr:nth-child(2) > th:nth-child(2). But as the error shows, this does not work.

I am new to webscraping and have almost no html knowledge, so any clue on how to extract the text "2013" from the table is welcome.

EDIT - I found ow how to it

startServer()
remDr <- remoteDriver(browserName="chrome")
remDr$open(silent=T)
remDr$navigate(as.character(url))
webElem <- remDr$findElement("id", "content_iframe")
remDr$switchToFrame(webElem)
webElem <- remDr$findElement("id", "passthrough")
remDr$switchToFrame(webElem)

remDr$findElement("xpath",'//*[@id="crosstable"]/table/tbody/tr[2]/th[2]')$getElementText()

The table is in an iframe which itself is in an iframe. You would need to use the `switchToFrame` method to access the appropriate frame to reference the table elements. — jdharrison, Oct 21 '16 at 10:54
Alternatively you can access the table frame direrctly at http://apps.who.int/gho/athena/data/GHO/HRH_26,HRH_33,HRH_28,HRH_25,HRH_27,HRH_31,HRH_29,HRH_30,HRH_32?profile=xtab&format=html&x-topaxis=GHO&x-sideaxis=COUNTRY;YEAR&x-title=table&filter=COUNTRY:* — jdharrison, Oct 21 '16 at 10:55
You can also download the data in many formats for example csv — jdharrison, Oct 21 '16 at 11:01

score 2 · Answer 1 · edited Oct 21 '16 at 13:47

2

webElem <- remDr$findElement("id", "content_iframe")

remDr$switchToFrame(webElem)

remDr$findElement("css selector","#crosstable > table > tbody > tr:nth-child(2) > th:nth-child(2)")$getElementText()

/* perform operation */

remDr$switchToFrame(NULL)

edited Oct 21 '16 at 13:47

evotopid

5,288
2
26
41

answered Oct 21 '16 at 13:38

Saravanan

91
4

Thank you for the answer. However, I got the error "NoSuchElement" at the `findElement()` part. – user3387899 Oct 21 '16 at 13:54
After switching to frame 'content_iframe' I should switch once more to frame 'passthrough' and only then I can access the element. – user3387899 Oct 21 '16 at 14:09
Yes, you need to switch through multiple (nested)frames to access the element. – Saravanan Oct 21 '16 at 14:19

RSelenium: Can't extract piece of text from table

1 Answers1