1

I will change the website, to make this question better. Still facing similar issues, that can't use only rvest package and maybe answer will be easier to obtain with RSelenium. Website: http://ravimaailma.fi/cg/tulokset/20/ and I want to obtain links from the main article which would direct me to individual race results. Links look something like this: http://ravimaailma.fi/article/tulokset/pori-18-11-2017-tulokset/8718/

I'm trying to use simple Rvest as thought that would be all needed here. SelectorGadget is giving links CSS as .article-title a, so my code is simply

url %>%
  read_html() %>% 
  html_nodes(".article-title a") %>% 
  html_text()

This will return nothing. Website loads more results when you scroll down, but I thought I would atleast get first results out. Below gives out some links and links 28:32 looks promising, but I think they are links from the sidebar, not from article.

url %>%
  read_html() %>% 
  html_nodes("a") %>% 
  html_attr("href")

What I'm I doing wrong here and can RSelenium help me?

Hakki
  • 1,440
  • 12
  • 26
  • AFAIK, for dynamic pages you need `RSelenium`. I've started recently myself, and [these](https://rpubs.com/johndharrison/RSelenium-Basics) [two](http://rpubs.com/johndharrison/RSelenium-Docker) tutorials have helped tremendously, just in case you haven't seen them. – Val Aug 09 '17 at 08:49
  • Thank You, I have gone through that but for some reason haven't been able to get connection. Will go through again and see if I missed something. I think docker didn't install correctly. – Hakki Aug 09 '17 at 08:55
  • Make sure `docker run hello-world` executes correctly. If so, run the `selenium` image of your choice (mine was Firefox) with debug so you can have a look through VNC. When establishing the connection, don't forget to specify the browser you're intending to use. – Val Aug 09 '17 at 08:57

2 Answers2

2

Here is my partial answer, still not getting all, but maybe helps some one. Code will return 1 link for first result. Not sure why it isn't giving them all. I'm using

library(RSelenium)
rD <- rsDriver(port = 4444L,  browser = "chrome")

remDr <- rD[["client"]]
remDr$navigate("http://ravimaailma.fi/cg/tulokset/20/")

elem <- remDr$findElement(using="css selector", value=".article-title a")
elemtxt <- elem$getElementAttribute("href")

#Click button to load more results
#button <- remDr$findElement(using="id", value="loadmore")
#button$clickElement()

remDr$close()

I haven't used button click yet, but seemed that it was working as well. Only problem is that I can't get all results from the site.

Hakki
  • 1,440
  • 12
  • 26
1

[I'm not (yet) allowed to write comments, so I chose to make this post an answer] RSelenium is not always necessary, you can also interact with a website using directly PhantomJS (see e.g. this example).

If you provided an example from the website instead of a local link to a .pdf, I can try to find out how to retrieve the data.

TomS
  • 226
  • 3
  • 10
  • http://biathlonresults.com/ this is the site that contains all the results data. It is pretty weird site, as you have to click and and explore to find things. Not sure if thats what you're looking for? – Hakki Sep 18 '17 at 09:16
  • Probably it's easier to start with a more specific piece of data instead of "I want to retrieve the whole website", as each page may require individual code. An example would be "2016/2017 -> result data ->BMW IBU [...] -> 2x6 + [...] Mixed Relay -> Top 3 Countries – TomS Sep 18 '17 at 11:15
  • ooh ok, as I could not pinpoint website as address won't change when you click forward those result. Example data would be: http://biathlonresults.com/ -> 2016/2017 -> 25Nov-4Dec Oestersund -> Men 20km Individual -> Result – Hakki Sep 19 '17 at 13:18
  • I've tried a couple of things over the last days, but for my limited knowledge the website seems to be "too interactive" for me to provide a working solution. I'm sorry! I hope there will be a SO user who sees this question and can guide you in the right direction – TomS Sep 25 '17 at 07:54
  • I changed website in question, as maybe earlier was too interactive. Would this be more suitable site? – Hakki Nov 18 '17 at 18:33