I'm trying to get data from this website, which contains real estate ads from Rio de Janeiro:
https://www.zapimoveis.com.br/aluguel/imoveis/rj+rio-de-janeiro/?gclid=EAIaIQobChMIrLjc2u7m2QIVhYGRCh3w9g0GEAAYASAAEgJKdvD_BwE#{%22parametrosautosuggest%22:[{%22Bairro%22:%22%22,%22Zona%22:%22%22,%22Cidade%22:%22RIO%20DE%20JANEIRO%22,%22Agrupamento%22:%22%22,%22Estado%22:%22RJ%22}],%22pagina%22:%221%22,%22ordem%22:%22Relevancia%22,%22paginaOrigem%22:%22ResultadoBusca%22,%22semente%22:%222109841258%22,%22formato%22:%22Lista%22}
My code works fine when I go into the nodes one by one, but the function html_text() from package "rvest" returns N/A when I try to loop through the xpath nodes.
Here is the piece of code I've written so far:
library(rvest)
library(httr)
Url<-"https://www.zapimoveis.com.br/aluguel/imoveis/rj+rio-de-janeiro/?gclid=EAIaIQobChMIrLjc2u7m2QIVhYGRCh3w9g0GEAAYASAAEgJKdvD_BwE#{%22parametrosautosuggest%22:[{%22Bairro%22:%22%22,%22Zona%22:%22%22,%22Cidade%22:%22RIO%20DE%20JANEIRO%22,%22Agrupamento%22:%22%22,%22Estado%22:%22RJ%22}],%22pagina%22:%221%22,%22ordem%22:%22Relevancia%22,%22paginaOrigem%22:%22ResultadoBusca%22,%22semente%22:%222109841258%22,%22formato%22:%22Lista%22}"
website<- GET(Url)
#vectors that will store the data I want to collect
condominio<-vector()
Iptu<-vector()
#loop through nodes
for (i in 1:2){
condominio[i]<- website %>%
read_html() %>%
html_node(xpath = "/html/body/div[3]/div[2]/section/div/article[i]/section[1]/a/div/span") %>%
html_text()
Iptu[i]<- website %>%
read_html() %>%
html_node(xpath = "/html/body/div[3]/div[2]/section/div/article[i]/section[1]/a/div/strong") %>%
html_text()
}
If I replace the variable i by a fixed number, such as 2, the code seems to work fine.
Could anyone help me find a way to extract data from more ads?
Thank you very much!