Scraping a web link with Rvest

Question

I am trying to figure out a way to extract a link from a wiki page. The web page is the following:

wiki <- html("https://en.wikipedia.org/wiki/Category:1879_births")

I am interested by the following link:

v =  html_node(wiki, "a[href*='pages']")
<a href="/w/index.php?title=Category:1879_births&amp;pagefrom=Barrymore%2C+Ethel%0AEthel+Barrymore#mw-pages" title="Category:1879 births">next page</a>

I want to extract the link after href but when I try to convert v as a character and split it, I get the following error message: " cannot coerce type 'externalptr' to vector of type 'character' "

Does anyone know how to deal with this "externalptr" type and extracting the link ?

Thanks in advance !

The data is being stored at the C level. To extract it you can place `@href` before the brackets. Maybe try the xpath `//a/@href[contains(., 'pages')]` or since it ends with `pages` maybe `//a/@href['pages'=substring(., string-length(.) - 4)]` — Rich Scriven, Sep 02 '15 at 16:37

score 0 · Answer 1 · answered Sep 02 '15 at 21:49

0

This should extract the href attribute:

html_attr(v, "href")

answered Sep 02 '15 at 21:49

Chris Merkord

306
1
5

Scraping a web link with Rvest

1 Answers1