0

I'm trying to web scrape a information from Google and they aren't liking it. The vector contains 2487 Google sites and from which one of them I want to get the text of the first result.

I tried to create a loop to slow down the process but I'm very bad at it.

b is the value that contain all the web sites. First, I tried:

ContentScraper(b, CssPatterns = ".st") -> b

But then, I tried to loop and slow it down, but I have no idea how to.

b[i] <- ContentScraper(i, CssPatterns = ".st")}

From the 55th and on all that I get is the error. Any thoughts on how to avoid it? Thanks.

Rodf
  • 11
  • 1
  • 1
    You can wrap it in a `tryCatch` and get pass that error – akrun May 23 '19 at 19:58
  • 1
    429 is "Too Many Requests", likely due to rate-limiting. Slow your request rate with artificial `Sys.sleep(...)` or some other method of ensuring you do exceed your quota. If you don't know what your limit is, then I suggest you take a look at the user licensing for the website you are scraping and determine either (a) how to increase those limits, or (b) what the limits are so that you don't violate them. – r2evans May 23 '19 at 20:02
  • Well, how am I supposed to use `Sys.sleep(...)` or `tryCatch`' in this loop that I created? I don't know where to put it. Thanks. – Rodf May 24 '19 at 15:31

2 Answers2

0

Insert Sys.sleep(...) inside the loop at the beginning of it

Nad Pat
  • 3,129
  • 3
  • 10
  • 20
Mariano
  • 341
  • 2
  • 4
0

One way is to use

Sys.sleep(...)

Another way if you're using puppeteer or playwright you can adjust the interval of the scrapes with celery beat.

Is that what you're looking for?

Yusuf Ganiyu
  • 842
  • 9
  • 8