0

Hello on this webpage http://www.uefa.com/uefachampionsleague/season=2016/statistics/round=2000634/players/index.html

I am trying using RSelenium click all of players names which are links, scrape individual players webpages go back and continue with another player

# packages
library(RSelenium)
library(XML)


 # navigation to the site
    remDr <- remoteDriver$new()
    remDr$open()
    remDr$navigate("http://www.uefa.com/uefachampionsleague/season=2016/statistics/round=2000634/players/index.html")

 # this will find all needed links
    player<-remDr$findElements(using = 'xpath',value = "//span/a")

 # this confirms that there are 20 links
    length(player)


# this is loop which is supposed to click go to all 20 pages scrape some info and proceed
for (i in 1:20) {

    player<-remDr$findElements(using = 'xpath',value = "//span/a")
    player[[i]]$clickElement()
    Sys.sleep(5)
    urlplayer<-remDr$getCurrentUrl()
    urlplayer2<-htmlParse(urlplayer)
    hraci<-xpathSApply(urlplayer2,path = "//ul[@class='innerText']/li",fun = xmlValue)
    print(hraci)
    remDr$goBack()
}

I run this code a few times but always after some iterations get the error Error in player[[i]] : subscript out of bounds.

If I look for the value of iterator in last try it was 7, sometimes it was 12 and other numbers.

I have no clue why I am getting this error and could be therefore appreciate anyone's help!

BradzTech
  • 2,755
  • 1
  • 16
  • 21
Tomas H
  • 713
  • 4
  • 10

1 Answers1

0

I suggest a different approach, which does not need Selenium:

library(XML)
doc <- htmlParse("http://www.uefa.com/statistics/uefachampionsleague/season=2016/statistics/round=2000634/players/_loadRemaining.html", encoding = "UTF-8")
n <- 3
hrefs <- head( xpathSApply(doc, "//tr/td[1]/span/a", xmlGetAttr, "href"), n )
players <- head( xpathSApply(doc, "//tr/td[1]/span/a", xmlValue), n )
for (x in seq(hrefs)) 
  download.file(paste0("http://www.uefa.com", hrefs[x]), file.path(tempdir(), paste0(players[x], ".html")) )

x <- 1
readHTMLTable(file.path(tempdir(), paste0(players[x], ".html")))
lukeA
  • 53,097
  • 5
  • 97
  • 100
  • Actually meanwhile I downloaded all webpages usign XML but my characters went messy. I see You added parameter encoding="UTF-8". I tried get names of players from loadremaining.html and now it's correct. So big thank you for that – Tomas H Apr 03 '16 at 21:52