How to scrape key statistics from Yahoo! Finance with R?

Question

Unfortunately, I am not an experienced scraper yet. However, I need to scrape key statistics of multiple stocks from Yahoo Finance with R.

I am somewhat familiar with scraping data directly from html using read_html, html_nodes(), and html_text() from the rvest package. However, this web page MSFT key stats is a bit complicated, I am not sure if all the stats are kept in XHR, JS, or Doc. I am guessing the data is stored in JSON.

If anyone knows a good way to extract and parse data for this web page with R, kindly answer my question, great thanks in advance!

Or if there is a more convenient way to extract these metrics via quantmod or Quandl, kindly let me know, that would be a extremely good solution!

The goal is to have tickets/symbols as rownames/rowlabels whereas the statistics are identified as columns. A illustration of my needs can be found at this Finviz link:

https://finviz.com/screener.ashx

The reason I would like to scrape Yahoo Finance data is because Yahoo also considers Enterprise, EBITDA key stats..

EDIT: I meant to refer to the key statistics page.. For example.. : https://finance.yahoo.com/quote/MSFT/key-statistics/ . The code should lead to one data frame rows of stock symbols and columns of key stats.

May help https://stackoverflow.com/questions/40245464/web-scraping-of-key-stats-in-yahoo-finance-with-r — NColl, Dec 30 '18 at 18:49
@NColl I did consider that topic earlier. However, the top answer relates to scraping Finviz instead.. — user3443027, Dec 30 '18 at 19:06

Roman · Answer 1 · 2018-12-31T00:55:08.320

Code

library(rvest)
library(tidyverse)

# Define stock name
stock <- "MSFT"

# Extract and transform data
df <- paste0("https://finance.yahoo.com/quote/", stock, "/financials?p=", stock) %>% 
    read_html() %>% 
    html_table() %>% 
    map_df(bind_cols) %>% 
    # Transpose
    t() %>%
    as_tibble()

# Set first row as column names
colnames(df) <- df[1,]
# Remove first row
df <- df[-1,]
# Add stock name column
df$Stock_Name <- stock

Result

  Revenue `Total Revenue` `Cost of Revenu… `Gross Profit`
  <chr>   <chr>           <chr>            <chr>         
1 6/30/2… 110,360,000     38,353,000       72,007,000    
2 6/30/2… 96,571,000      33,850,000       62,721,000    
3 6/30/2… 91,154,000      32,780,000       58,374,000    
4 6/30/2… 93,580,000      33,038,000       60,542,000    
# ... with 25 more variables: ...

edit:
Or, for convenience, as a function:

get_yahoo <- function(stock){
  # Extract and transform data
  x <- paste0("https://finance.yahoo.com/quote/", stock, "/financials?p=", stock) %>% 
    read_html() %>% 
    html_table() %>% 
    map_df(bind_cols) %>% 
    # Transpose
    t() %>%
    as_tibble()

  # Set first row as column names
  colnames(x) <- x[1,]
  # Remove first row
  x <- x[-1,]
  # Add stock name column
  x$Stock_Name <- stock

  return(x)
}

Usage: get_yahoo(stock)

thank you very much! However, I meant to refer to the key statistics page.. https://finance.yahoo.com/quote/MSFT/key-statistics/ . The code should lead to one data frame rows of stock symbols and columns of key stats. — user3443027, Dec 30 '18 at 19:30
Well, you can just change the URL to get the result you want. Have you tried to run it? Do you need some help understanding the code? — Roman, Dec 31 '18 at 00:53

score 2 · Answer 2 · answered Dec 30 '18 at 19:55

I hope that this is what are you looking for:

library(quantmod)
library(plyr)

what_metrics <- yahooQF(c("Price/Sales", 
                          "P/E Ratio",
                          "Price/EPS Estimate Next Year",
                          "PEG Ratio",
                          "Dividend Yield", 
                          "Market Capitalization"))

Symbols<-c("XOM","MSFT","JNJ","GE","CVX","WFC","PG","JPM","VZ","PFE","T","IBM","MRK","BAC","DIS","ORCL","PM","INTC","SLB")


metrics <- getQuote(paste(Symbols, sep="", collapse=";"), what=what_metrics)

to get the list of metrics

yahooQF()

score 0 · Answer 3 · answered Dec 30 '18 at 19:23

0

you can use lapply to get more than one pirce

library(quantmod) 

Symbols<-c("XOM","MSFT","JNJ","GE","CVX","WFC","PG","JPM","VZ","PFE","T","IBM","MRK","BAC","DIS","ORCL","PM","INTC","SLB")

StartDate <- as.Date('2015-01-01')

Stocks <-  lapply(Symbols, function(sym) {
  Cl(na.omit(getSymbols(sym, from=StartDate, auto.assign=FALSE)))
})

Stocks <- do.call(merge, Stocks)

in this case i get the closing price look in function Cl()

answered Dec 30 '18 at 19:23

K. Peltzer

326
3
7

Thank you very much! However, I meant to refer to the key statistics page https://finance.yahoo.com/quote/MSFT/key-statistics/ – user3443027 Dec 30 '18 at 19:28

How to scrape key statistics from Yahoo! Finance with R?

3 Answers3

Code

Result

Linked