0

I'm trying to scrape all the providers from this page: https://www.agedcareguide.com.au/nursing-homes/providers/vic

I'm using RSelenium on my Mac by running the following code in Terminal with Docker:

docker run -d -p 4445:4444 selenium/standalone-firefox

Then when I return to RStudio and run the following:

remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, 
browserName = "firefox")
remDr$open()
remDr$navigate("https://www.agedcareguide.com.au/nursing-homes/providers/vic")
remDr$getTitle()

All is good.

Then I try to get the element by using:

provs <- remDr$findElement()

and inside the brackets I have used the XPath, CSS Selector, everything I can think of but it always comes up saying:

Error in match.arg(using) : 'arg' should be one of “xpath”, “css selector”, “id”, “name”, “tag name”, “class name”, “link text”, “partial link text”

Anybody got any ideas where I'm going so terribly wrong?

  • 1
    This seems to work without an error `provs <- remDr$findElement(using="class",value="c-result-list")`. Note that this only finds the element, it does not get it without a bit of further processing. An alternative would be to use `page <- remDr$getPageSource()` after your `navigate` line, and then use `rvest` or similar to extract what you want from `page`. – Andrew Gustar Mar 27 '18 at 10:00
  • Looking at the page source through right-clicking, the elements aren't there in text. Not sure how rvest would then be able to find them? And trying the findElement option returns an analysis on the browser. – Foothill_trudger Mar 27 '18 at 22:11
  • Yes, although the 'selector' code seems to work - but you might need to also build in a delay. See answer below. Good luck! – Andrew Gustar Mar 28 '18 at 09:21

1 Answers1

2

A partial solution...

with RSelenium...

remDr$navigate(...)
Sys.sleep(20) #the page keeps loading for some time
page <- remDr$getPageSource()

then, with rvest...

provs <- page[[1]] %>% read_html() %>% 
   html_node("#app > div > div.c-col-results > div:nth-child(3) > div > section") %>% 
   html_text()

after a bit of tidying (split by \\n, remove blanks)...

provs
 [1] "AdventCare"                                     "Providing nursing homes" 
 [3] "Alexandra Gardens SRS"                          "Providing nursing homes" 
 [5] "Allbright Manor"                                "Providing nursing homes"
 [7] "Alliance Care Services Group"                   "Providing nursing homes" 
 etc...

Hopefully this will help get you started, although it is a tricky one!

Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
  • 1
    Thanks for this - it works to get the data out which is great. Now I've got to master all the tidying which is a bit frustrating! – Foothill_trudger Mar 29 '18 at 05:05