1

I want to extract the hyperlinks from this website with different searches (dont be scared that it is in Danish) . The hyperlinks can be found to the right (v15, v14, v13 etc) [example]. The website I try to scrape somehow uses the search results from some kind of a jquery/javascript. This is based on my very limited knowledge in HTML and might be wrong.

I think this fact makes the following code unable to run (I use the "rvest"-package):

sdslink="http://karakterstatistik.stads.ku.dk/#searchText=&term=&block=&institute=null&faculty=&searchingCourses=true&page=1"
s_link = recs %>% 
read_html(encoding = "UTF-8") %>% 
html_nodes("#searchResults a") %>% 
html_attr("href")

I have found a method that works but it requires me to download the pages manually with "right click"+"save as" for each page. This is however unfeasible as I want to scrape a total of 100 pages for hyperlinks.

I have tried to use the jsonlite package combined with httr but I am not able to find the right .json file it seems.

I hope you guys might have a solution, either to get the jsonlite to work, automate the "save as" solution or a third more clever path.

ScrapeGoat
  • 47
  • 1
  • 9

1 Answers1

2

One approach is to use RSelenium. Here's some simple code to get you started. I assume you already have RSelenium and a webdriver installed. Navigate to your site of interest:

library(RSelenium)
startServer()
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4444, 
                      browserName = "chrome")
remDr$open(silent = TRUE)
remDr$navigate("http://karakterstatistik.stads.ku.dk/")

Find the submit button by inspecting the source:

webElem <- remDr$findElement("name", "submit")
webElem$clickElement()

Save the first 5 pages:

html_source <- vector("list", 5)
i <- 1
while (i <= 5) {
  html_source[[i]] <- remDr$getPageSource()
  webElem <- remDr$findElement("id", "next")
  webElem$clickElement()
  Sys.sleep(2)
  i <- i + 1
}
remDr$close()
Weihuang Wong
  • 12,868
  • 2
  • 27
  • 48
  • Thanks a bunch for the help it was **exactly what I needed**. Had to look a bit into Selenium before I could get it to work. [This answer](http://stackoverflow.com/a/31188481/6717092) to another question was very useful if other people are experiencing trouble installing. – ScrapeGoat Aug 16 '16 at 15:24
  • You're welcome -- please mark the answer as accepted if your question has been answered satisfactorily. – Weihuang Wong Aug 16 '16 at 15:34
  • Done and thanks. I have a fairly quick [follow-up question](http://stackoverflow.com/q/38991773/6717092) – ScrapeGoat Aug 17 '16 at 08:36