1

When is the random delay in the colly limiter taking place?

Based on the example code from: http://go-colly.org/docs/examples/random_delay/ I wrote the following:

func main() {
    url := "https://httpbin.org/delay/2"

    // Instantiate default collector
    c := colly.NewCollector(
        // Attach a debugger to the collector
        // colly.Debugger(&debug.LogDebugger{}),
        colly.Async(true),
    )

    // Limit the number of threads started by colly to two
    // when visiting links which domains' matches "*httpbin.*" glob
    c.Limit(&colly.LimitRule{
        DomainGlob:  "*httpbin.*",
        Parallelism: 2,
        RandomDelay: 60 * time.Second,
    })

    c.OnRequest(func(r *colly.Request) {
        fmt.Printf("%v: Visiting %v \n", time.Now(), r.URL)
    })

    c.OnResponse(func(r *colly.Response) {
        fmt.Printf("%v: Got a response from %v \n", time.Now(), r.Request.URL)
    })

    // Start scraping in four threads on https://httpbin.org/delay/2
    for i := 0; i < 4; i++ {
        c.Visit(fmt.Sprintf("%s?n=%d", url, i))
    }
    // Start scraping on https://httpbin.org/delay/2
    c.Visit(url)
    // Wait until threads are finished
    c.Wait()
}

As you can see, I just added output functions to the OnRequest and OnResponse function handler. If I run the code I do get the following output:

2022-11-17 13:22:19.7909047 : Visiting https://httpbin.org/delay/2 
2022-11-17 13:22:19.7914046 : Visiting https://httpbin.org/delay/2?n=2 
2022-11-17 13:22:19.7909047 : Visiting https://httpbin.org/delay/2?n=0
2022-11-17 13:22:19.7909047 : Visiting https://httpbin.org/delay/2?n=1
2022-11-17 13:22:19.7914046 : Visiting https://httpbin.org/delay/2?n=3
2022-11-17 13:22:35.3481234 : Got a response from https://httpbin.org/delay/2 
2022-11-17 13:22:41.0593007 : Got a response from https://httpbin.org/delay/2?n=2 
2022-11-17 13:22:42.8457206 : Got a response from https://httpbin.org/delay/2?n=0 
2022-11-17 13:23:31.4748948 : Got a response from https://httpbin.org/delay/2?n=1 
2022-11-17 13:23:37.4104064 : Got a response from https://httpbin.org/delay/2?n=3 

So, all the visits are taking place at (almost) the same time, only the response is delayed. So can I assume that colly visits all the websites instantly and only delays the response or does it actually delay the next visit? The latter one would make more sense and would avoid being blocked from the site. But then the OnRequest Handler seems to be misleading in my opinion.

Rikku
  • 39
  • 3

0 Answers0