NA when trying to loop through xpath nodes R

Question

I'm trying to get data from this website, which contains real estate ads from Rio de Janeiro:

https://www.zapimoveis.com.br/aluguel/imoveis/rj+rio-de-janeiro/?gclid=EAIaIQobChMIrLjc2u7m2QIVhYGRCh3w9g0GEAAYASAAEgJKdvD_BwE#{%22parametrosautosuggest%22:[{%22Bairro%22:%22%22,%22Zona%22:%22%22,%22Cidade%22:%22RIO%20DE%20JANEIRO%22,%22Agrupamento%22:%22%22,%22Estado%22:%22RJ%22}],%22pagina%22:%221%22,%22ordem%22:%22Relevancia%22,%22paginaOrigem%22:%22ResultadoBusca%22,%22semente%22:%222109841258%22,%22formato%22:%22Lista%22}

My code works fine when I go into the nodes one by one, but the function html_text() from package "rvest" returns N/A when I try to loop through the xpath nodes.

Here is the piece of code I've written so far:

library(rvest)
library(httr)



Url<-"https://www.zapimoveis.com.br/aluguel/imoveis/rj+rio-de-janeiro/?gclid=EAIaIQobChMIrLjc2u7m2QIVhYGRCh3w9g0GEAAYASAAEgJKdvD_BwE#{%22parametrosautosuggest%22:[{%22Bairro%22:%22%22,%22Zona%22:%22%22,%22Cidade%22:%22RIO%20DE%20JANEIRO%22,%22Agrupamento%22:%22%22,%22Estado%22:%22RJ%22}],%22pagina%22:%221%22,%22ordem%22:%22Relevancia%22,%22paginaOrigem%22:%22ResultadoBusca%22,%22semente%22:%222109841258%22,%22formato%22:%22Lista%22}"


website<- GET(Url)


#vectors that will store the data I want to collect
condominio<-vector()
Iptu<-vector()


#loop through nodes
for (i in 1:2){
condominio[i]<- website %>%
  read_html() %>%
html_node(xpath = "/html/body/div[3]/div[2]/section/div/article[i]/section[1]/a/div/span") %>%
html_text()

Iptu[i]<- website %>%
  read_html() %>%
  html_node(xpath = "/html/body/div[3]/div[2]/section/div/article[i]/section[1]/a/div/strong") %>%
  html_text()




}

If I replace the variable i by a fixed number, such as 2, the code seems to work fine.

Could anyone help me find a way to extract data from more ads?

Thank you very much!

score 0 · Answer 1 · answered Mar 22 '18 at 02:35

I prefer to specify css instead of xpath. Try something like this.

library(rvest)
library(httr)

Url<-"https://www.zapimoveis.com.br/aluguel/imoveis/rj+rio-de-janeiro/?gclid=EAIaIQobChMIrLjc2u7m2QIVhYGRCh3w9g0GEAAYASAAEgJKdvD_BwE#{%22parametrosautosuggest%22:[{%22Bairro%22:%22%22,%22Zona%22:%22%22,%22Cidade%22:%22RIO%20DE%20JANEIRO%22,%22Agrupamento%22:%22%22,%22Estado%22:%22RJ%22}],%22pagina%22:%221%22,%22ordem%22:%22Relevancia%22,%22paginaOrigem%22:%22ResultadoBusca%22,%22semente%22:%222109841258%22,%22formato%22:%22Lista%22}"

website<- GET(Url)

#vectors that will store the data I want to collect
condominio<-vector()
Iptu<-vector()

condominio<- website %>%
  read_html() %>%
  html_nodes("article section a div span") %>%
  html_text()

Iptu<- website %>%
  read_html() %>%
  html_nodes("article section a div strong") %>%
  html_text()

NA when trying to loop through xpath nodes R

1 Answers1