0

i tried this code using r vest in order to extract some nested information from a link but it is returning NA in the last variable "links".

library("robotstxt")
library("dplyr")
library("rvest")

url<-"https://www.car.gr/classifieds/cars/?fs=1&condition=used&offer_type=sale&modified=15&st=private"

paths_allowed(domain = "https://www.car.gr/classifieds/cars/?fs=1&condition=used&offer_type=sale&modified=15&st=private" )

page<-read_html(url)

Title<-page %>% html_nodes(".title") %>% html_text()

Price<-page %>% html_nodes(".price-fmt") %>% html_text()

links<-page %>% html_nodes(".title") %>% 
       html_attr("h2") %>% paste0("https://www.car.gr", .)
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
Spink
  • 19
  • 3

1 Answers1

1

The class element you are looking for is not .title, but .row-anchor, like such:

links <- page %>% html_nodes(".row-anchor") %>% 
       html_attr("href")

It can be helpful to use the "inspector" in your browser to identify classes. In the same tool (both firefox and chrome) you can fulltext search for keywords. Just type in a sample link and you will easily find the respective tag for your link.

Datapumpernickel
  • 606
  • 6
  • 14