0

I am trying to figure out a way to extract a link from a wiki page. The web page is the following:

wiki <- html("https://en.wikipedia.org/wiki/Category:1879_births")

I am interested by the following link:

v =  html_node(wiki, "a[href*='pages']")
<a href="/w/index.php?title=Category:1879_births&amp;pagefrom=Barrymore%2C+Ethel%0AEthel+Barrymore#mw-pages" title="Category:1879 births">next page</a> 

I want to extract the link after href but when I try to convert v as a character and split it, I get the following error message: " cannot coerce type 'externalptr' to vector of type 'character' "

Does anyone know how to deal with this "externalptr" type and extracting the link ?

Thanks in advance !

Jb_Eyd
  • 635
  • 1
  • 7
  • 20
  • The data is being stored at the C level. To extract it you can place `@href` before the brackets. Maybe try the xpath `//a/@href[contains(., 'pages')]` or since it ends with `pages` maybe `//a/@href['pages'=substring(., string-length(.) - 4)]` – Rich Scriven Sep 02 '15 at 16:37
  • @RichardScriven It's absolutely perfect. Thanks a lot ! – Jb_Eyd Sep 02 '15 at 16:51

1 Answers1

0

This should extract the href attribute:

html_attr(v, "href")
Chris Merkord
  • 306
  • 1
  • 5