0

I have webscraped the URLs corresponding to different tables from a webpage (using RSelenium). These URLs I stored in an object called 'URL'. Now I want in a next step to write away each of the table in text format to the directory I specified in the loop below. But for some reason the loop stops after 5 iterations and I cannot figure out why? Any ideas or hints?

for (i in 1:length(URL)){
remDr$navigate(URL[i])
  date <- Sys.Date()
file <- paste("./WebScraping Connecting/Connecting_","_", date, ".txt", sep="")
y2 <- remDr$getPageSource()
y2 <- unlist(y2)
y3 <- readHTMLTable(y2, header=TRUE)
l <- unlist(lapply(y3[[1]], paste, collapse=" "))
for (j in 2:length(y3)){
l1 <- unlist(lapply(y3[[j]], paste, collapse=" "))
if (!is.null(l1)){
l <- rbind(l, l1)
}
}

write(as.vector(l), file=file)

Update: Sometimes the loop stops even after 3 iterations. It seems to work fine if I go through the loop by hand. However, I do get this warning message:

In rbind(l, l1) :
number of columns of result is not a multiple of vector length (arg 1)
user3387899
  • 601
  • 5
  • 18
  • What are the lengths of `URL` and `y3` ? – Joel Carlson Jan 06 '16 at 12:32
  • URL is a vector of length 200 and y3 is a list of length 3 – user3387899 Jan 06 '16 at 13:09
  • Do you get any errors or warnings when the loop stops? Does `l` hold any values after it stops? If you only have 5 iterations, it may be fruitful just to go by hand through the loop, and see if the 6th value is different in some way than the previous values... – Joel Carlson Jan 06 '16 at 13:18
  • Yes, for each iteration output is generated. Sometimes the loop seems to stop even after 3 iterations. It works fine if I go through the loop by hand. I do, however, get a the warning message: "In rbind(l, l1) : number of columns of result is not a multiple of vector length (arg 1)" – user3387899 Jan 06 '16 at 13:48
  • I'm thinking that instead of growing a dataframe, you should make a list, and add to it with syntax like : `lst <- list()` ` lst[[name]] <- value` And then at the end bind it all together using: `do.call(rbind.data.frame, lst)` – Joel Carlson Jan 06 '16 at 13:59

0 Answers0