0

I am webscraping a website in Jordan. The first page I'm scraping is https://alrai.com/search?date-from=2004-09-21&pgno=1. I'm trying to make R run through each date and then each nested link that takes you to other pages (pgno=1,2,3 etc). The for loop works when I only use to obtain the links on 2004-09-21, but I need to be able to move up in dates.

I thought using another for loop around the first one that cycles through dates would work. But now the code as it is only returns the 10 elements on the first page and doesn't even go through the other page numbers.

  
for (i in seq_along(days)){
for (pagenumber in seq(from = 1, to = 10, by = 1)){
  link = paste("https://alrai.com/search?date-from=",(days[i]), "&pgno=", 
               pagenumber, sep = "")
  page = read_html(link)
} 
}

readlink <- read_html(link)
  
text_title <- readlink %>% 
  html_elements(".font-700") %>%
  html_text2()

article_links <- readlink %>%
  html_elements(".font-700") %>%
  html_attr("href") 
alvaro49
  • 3
  • 2

1 Answers1

0

Scraping the first 5 pages with purrr::map_dfr (without loop).

library(tidyverse)
library(rvest)

scraper <- function(page) {
  site <- str_c("https://alrai.com/search?date-from=2004-09-21&pgno=",
                page) %>%
    read_html()
  
  tibble(title = site %>%
           html_elements(".font-700") %>%
           html_text2())
}

map_dfr(1:5, scraper)
Chamkrai
  • 5,912
  • 1
  • 4
  • 14