-2

I'm not familiar with web scraping although I've managed to get some contents in few ocasions. However, this time although my problem looks simple I can't get a string containing the symbol, name and market in a web page. That is, I'd like to get the string "Merck KGaA (MRK.DE) -XETRA" in the url. I've tried the following code which returns few tables but I can't get the piece I'm looking for:

url <- 'https://finance.yahoo.com/q?s=MRK.DE&ql=0'
require(httr)
require(XML)
table <- readHTMLTable(content(GET(url)),header=TRUE)
nopeva
  • 1,583
  • 5
  • 22
  • 38

1 Answers1

1

This probably isn't the most efficient script here, but it'll definitely work:

library(rvest)
library(magrittr)
library(stringr)

html(url) %>%
  html_nodes("h2") %>%
  extract2(3) %>%
  as('character') %>%
  str_replace('<h2>', '') %>%
  str_replace('</h2>', '')

[1] "Merck KGaA (MRK.DE)"
maloneypatr
  • 3,562
  • 4
  • 23
  • 33
  • many thanks for your help. Do you know why the string is not captured by a more direct call such as the one I tried? On one hand, I would like to either use base R or some standard packages such as `XML` or `httr`/`RCurl` if possible. On the other hand, the simpler the code the better. – nopeva Dec 15 '14 at 16:13
  • 1
    `rvest` is actually a package recently released by Hadley to mirror Beautiful Soup from Python. I have found it to be the most intuitive without a strong working knowledge of HTML. In your example, `readHTMLTable` will only scrape table data from the given page and it doesn't look like "Merck..." sits within a table. – maloneypatr Dec 15 '14 at 19:45
  • 2
    You can make it a bit simpler: `html(url) %>% html_nodes("h2") %>% html_text() %>% .[[3]]` – hadley Dec 16 '14 at 06:13
  • Thanks for the clarification @hadley! I definitely like your answer better. I was still tinkering with your package when I came across the question. As always, thanks for making my life easier. :) – maloneypatr Jan 21 '15 at 20:36