1

I am writing a script to data scrape analyst share ratings and current share prices from the internet in R (using RStudio);

library(rvest)
BKGURL <- 'http://www.marketbeat.com/stocks/LON/BKG/'   #analysts
BKGwebpage <- read_html(BKGURL)
BKGhtml <- html_nodes(BKGwebpage, "td:nth-child(5) , td:nth-child(4) , td:nth- child(3) , td:nth-child(2) , td:nth-child(1)")
BKG <- html_text(BKGhtml)                               #imports analyst text

BKGprice <- 'http://markets.investorschronicle.co.uk/research/Markets/Companies/Summary?s=BKG:LSE'
BKGpricewebpage <- read_html(BKGprice)
BKGpriceHTML <- html_nodes(BKGpricewebpage, "#wsod td.first")
BKGgbpp <- html_text(BKGpriceHTML)                      #imports current share price in text

before compiling them in a data frame; (code for INN not shown to save space)

Code <- c('BKG', 'INN')
Analysts_Opinion <- c(BKG [2], INN [2])
Consensus <- c(BKG [4], INN [4])
Price_target <- c(BKG [6], INN [6])
Last_rating <- c(BKG [7], INN [7])
Current_price <- c(BKGgbpp [1], INNgbpp [1])

Scrapev1 <- data.frame(Code, Analysts_Opinion, Consensus, Price_target, Last_rating, Current_price)

Scrapev1 then gives

Code                                    Analysts_Opinion          Consensus Price_target Last_rating Current_price
1  BKG 2 Sell Rating(s), 6 Hold Rating(s), 8 Buy Rating(s) Hold (Score: 2.38) GBX 3,434.29   7/26/2016         2,650
2  INN                                     1 Buy Rating(s)  Buy (Score: 3.00)      GBX 190    2/2/2016        198.00

So the code works fine for importing the data, but I need to repeat/replicate the code in the top panel 350 times, changing "BKG" for the 349 other codes in each URL and name. Currently I am stumped on what to do as copy and pasting each would take quite some time, surely there is a quicker way of doing it in R?

Any help or suggestions as to how to tackle this problem would be much appreciated. Apologies if the code is sloppy, I have taught myself (poorly) R by using this very website and come from a Pharmacology background - with an interest in technology!

1 Answers1

2

You could do it with strings, then parsing and evaluating them. However I wouldn't advise you to do so. The best way in my opinion would be to use lists and names. Something like:

library(rvest)
auxlist<- c('BKG', 'ASD', 'QWE')
URLS <- c() # Or list()
webpages <- list()
# etc...
for(comp in auxlist){
  URLS[[comp]] <- paste0('http://www.marketbeat.com/stocks/LON/', comp, '/')
  webpages[[comp]] <- read_html(URLS[[comp]])
  # etc... 
}
Felipe Gerard
  • 1,552
  • 13
  • 23
  • yes! this works. now I need to adjust the data frame – Andreas Wersäll Aug 12 '16 at 18:06
  • can I ask (for future purposes) why URLS is c() and webpages is list(). Much appreciated Felipe – Andreas Wersäll Aug 12 '16 at 18:06
  • Since I don't know the outcome of `read_html`, I don't know if it will return strings only. The safest way is always use lists because they aren't type-dependent. However, if they are all sure to be strings, it is more efficient to use a character vector (and starting it as `character(350)` instead of `c()`). Since there are only 350 options, maybe using only lists is more convenient and that way you can store anything. – Felipe Gerard Aug 12 '16 at 20:05
  • awesome. thank you very much felipe, much appreciated – Andreas Wersäll Aug 13 '16 at 11:28