get string in url (web scraping)

Question

I'm not familiar with web scraping although I've managed to get some contents in few ocasions. However, this time although my problem looks simple I can't get a string containing the symbol, name and market in a web page. That is, I'd like to get the string "Merck KGaA (MRK.DE) -XETRA" in the url. I've tried the following code which returns few tables but I can't get the piece I'm looking for:

url <- 'https://finance.yahoo.com/q?s=MRK.DE&ql=0'
require(httr)
require(XML)
table <- readHTMLTable(content(GET(url)),header=TRUE)

score 1 · Accepted Answer · answered Dec 15 '14 at 15:00

1

This probably isn't the most efficient script here, but it'll definitely work:

library(rvest)
library(magrittr)
library(stringr)

html(url) %>%
  html_nodes("h2") %>%
  extract2(3) %>%
  as('character') %>%
  str_replace('<h2>', '') %>%
  str_replace('</h2>', '')

[1] "Merck KGaA (MRK.DE)"

answered Dec 15 '14 at 15:00

maloneypatr

3,562
4
23
33

many thanks for your help. Do you know why the string is not captured by a more direct call such as the one I tried? On one hand, I would like to either use base R or some standard packages such as `XML` or `httr`/`RCurl` if possible. On the other hand, the simpler the code the better. – nopeva Dec 15 '14 at 16:13
1

`rvest` is actually a package recently released by Hadley to mirror Beautiful Soup from Python. I have found it to be the most intuitive without a strong working knowledge of HTML. In your example, `readHTMLTable` will only scrape table data from the given page and it doesn't look like "Merck..." sits within a table. – maloneypatr Dec 15 '14 at 19:45
2

You can make it a bit simpler: `html(url) %>% html_nodes("h2") %>% html_text() %>% .[[3]]` – hadley Dec 16 '14 at 06:13
Thanks for the clarification @hadley! I definitely like your answer better. I was still tinkering with your package when I came across the question. As always, thanks for making my life easier. :) – maloneypatr Jan 21 '15 at 20:36

get string in url (web scraping)

1 Answers1