I am using this list of user agents: https://developers.whatismybrowser.com/useragents/explore/hardware_type_specific/computer/
and in the mean time I am webscraping multiple pages from amazon. I want to change user agent during the scraping, so amazon doesn't block me after 100 pages of scraping with a 503 error.
the problem I am trying to solve is that the code pick only one user agent from the list and then use it for all the loop, I want at least that the code change the user agent 2 or 3 time during the loop, choosing this 2/3 user agents from the list in the link.
tell me if you have further question. I leave the code below:
library(rvest)
library(tidyverse)
ua_links <- read_html(paste0("https://developers.whatismybrowser.com/useragents/explore/hardware_type_specific/computer/"))
ua <- ua_links %>% html_nodes(".code") %>% html_text(trim = TRUE)
df_monitors <- list()
for (i in 2:400) {
#read page
page <- read_html(paste0("https://www.amazon.it/s?i=computers&rh=n%3A460159031&fs=true&page=", i), user_agent = Sample(ua))
Sys.sleep(4)
#read the parent nodes
monitors <- page %>% html_nodes(xpath= "//div[@class='a-section a-spacing-small s-padding-left-small s-padding-right-small']")
# parse information from each of the parent nodes
description <- monitors %>% html_node(xpath= ".//*[@class='a-size-base-plus a-color-base a-text-normal']") %>% html_text(trim = TRUE)
price <- monitors %>% html_node(xpath= ".//*[@class='a-price-whole']") %>% html_text(trim = TRUE)
# put the data together into a data frame add to list
df_monitors[[i]] <- data.frame(description,price)
print(paste("Page:",i))
}
#combine all data frames into 1
monitor_final <- bind_rows(df_monitors)``