0

Have a look at this webpage. I want to extract the text element '2013'. I use RSelenium for this, but if anyone knows how to do it using any other package that is fine too. My current script is the following:

startServer()
remDr <- remoteDriver(browserName="chrome")
remDr$open(silent=T)
remDr$navigate(as.character(url))
remDr$findElement("css selector","#crosstable > table > tbody > tr:nth-child(2) > th:nth-child(2)")$getElementText()

This gives following error:

Error:   Summary: NoSuchElement
Detail: An element could not be located on the page using the given search parameters.
class: org.openqa.selenium.NoSuchElementException

The first thing I noted was that it is not possible to select this short piece of text using selectorgadget. So I want looking for the piece of text in the source code and copied its specific selector path: #crosstable > table > tbody > tr:nth-child(2) > th:nth-child(2). But as the error shows, this does not work.

I am new to webscraping and have almost no html knowledge, so any clue on how to extract the text "2013" from the table is welcome.

EDIT - I found ow how to it

startServer()
remDr <- remoteDriver(browserName="chrome")
remDr$open(silent=T)
remDr$navigate(as.character(url))
webElem <- remDr$findElement("id", "content_iframe")
remDr$switchToFrame(webElem)
webElem <- remDr$findElement("id", "passthrough")
remDr$switchToFrame(webElem)

remDr$findElement("xpath",'//*[@id="crosstable"]/table/tbody/tr[2]/th[2]')$getElementText()

user3387899
  • 601
  • 5
  • 18
  • The table is in an iframe which itself is in an iframe. You would need to use the `switchToFrame` method to access the appropriate frame to reference the table elements. – jdharrison Oct 21 '16 at 10:54
  • Alternatively you can access the table frame direrctly at http://apps.who.int/gho/athena/data/GHO/HRH_26,HRH_33,HRH_28,HRH_25,HRH_27,HRH_31,HRH_29,HRH_30,HRH_32?profile=xtab&format=html&x-topaxis=GHO&x-sideaxis=COUNTRY;YEAR&x-title=table&filter=COUNTRY:* – jdharrison Oct 21 '16 at 10:55
  • You can also download the data in many formats for example csv – jdharrison Oct 21 '16 at 11:01

1 Answers1

2
webElem <- remDr$findElement("id", "content_iframe")

remDr$switchToFrame(webElem)

remDr$findElement("css selector","#crosstable > table > tbody > tr:nth-child(2) > th:nth-child(2)")$getElementText()

/* perform operation */

remDr$switchToFrame(NULL)
evotopid
  • 5,288
  • 2
  • 26
  • 41
Saravanan
  • 91
  • 4