find link from nested link In R

Question

I am learning text mining using R. I am trying to find all the links in a HTML document.

I tried getHTMLLinks() but it is showing following error:

url = "https://elections.maryland.gov/elections/2012/election_data/index.html"
getHTMLLinks(url)

character(0)
Warning message:
XML content does not seem to be XML: 'https://elections.maryland.gov/elections/2012/election_data/index.html'

so I tired "rvest" package to find the links. The code is as follow:

links = xml2::read_html(url) %>% #read html link
  html_nodes("a") %>% #select a node
  html_attr("href") %>% #from a node select all href (hyperlink) tags
  .[grep("general.csv",.,ignore.case = T)]

It give all links in vector format.

head(links)

"State_Congressional_Districts_2012_General.csv" "State_Legislative_Districts_2012_General.csv"  
[3] "All_By_Precinct_2012_General.csv"               "Allegany_County_2012_General.csv"              
[5] "Allegany_By_Precinct_2012_General.csv"          "Anne_Arundel_County_2012_General.csv"

These all the links are just names listed in href tag. But actually these all are hyperlinks to a table.

It would be really great if anyone can help me that how can I extract the final links instead of name of these hyperlinks?

If you want the full url leading to each of those files, affix the original url to the elements of that vector: `paste("https://elections.maryland.gov/elections/2012/election_data", links, sep = "/")` — paqmo, Apr 14 '20 at 12:15
thanks for the workaround. But it might be possible that a table in web page is linked to different web site. — Rohit parihar, Apr 14 '20 at 12:21
if that is the case, it will show the entire url--try this, for example: `read_html("https://en.wikipedia.org/wiki/Statistics") %>% html_nodes("a") %>% html_attr("href")` — paqmo, Apr 14 '20 at 12:24
I am new to HTML so I don't know this concept. Thanks for the help. — Rohit parihar, Apr 14 '20 at 12:34

find link from nested link In R

0 Answers0