I am trying to use R vest to webscrape the NASDAQ closing dates for the last 3 months so I can play around with the data.
Problem being I cant seem to find the correct xpath for it to return the table. I've tried quite a few using chrome's 'inspect element' to find xpaths as well as 'SelectorGadget' plug-in for chrome.
It seems most people have done this with python but I am much more comfortable in R and specifically using R vest for web scraping so i'm hoping i'm not alone!
I've posted my code below. I believe the problem is in identifying the xpath. Here is an example of one of the webpages...http://finance.yahoo.com/q/hp?s=CSV
After I get one to work I hope to put it in a loop which is below my problem code....
Thank you!
library("rvest")
library("data.table")
library("xlsx")
#Problem Code
company <- 'CSV'
url <- paste("http://finance.yahoo.com/q/hp?s=",toString(company),sep="")
url <-html(url)
select_table <- '//table' #this is the line I think is incorrect
fnames <- html_nodes(url, xpath=select_table) %>% html_table(fill=TRUE)
STOCK <- fnames[[1]]
STOCKS <- rbind(STOCK, STOCKS)
#---------------------------------------------------------------------
#Loop for use later
companylist <- read.csv('companylist.csv') #this is a list of all company tickers in the NASDAQ
STOCK <- data.frame()
STOCKS <- data.frame(Date=character(),Open=character(),High=character(),Low=character(),Close=character(),Volume=character(), AdjClose=character())
for (i in 1:3095) {
company <- companylist[i,1]
url <- paste("http://finance.yahoo.com/q/hp?s=",toString(company),sep="")
url <-html(url)
select_table <- '//*[@id="yfncsumtab"]/tbody/tr[2]/td[1]/table[4]'
fnames <- html_nodes(url,xpath = select_table) %>% html_table(fill=TRUE)
STOCK <- fnames[[1]]
STOCKS <- rbind(STOCK, STOCKS)
}
View(STOCKS)