i have a go colly project that i use to crawl multiple links that i fetch from a table like below :
func main() {
//db, err := sql.Open("postgres", "postgresql://postgres:postgres@localhost:5432/db?sslmode=disable")
dbutil.Init()
defer dbutil.Close()
db := dbutil.GetDB()
rows, err := db.Query("SELECT id, link FROM cities_table")
for rows.Next() {
for _, City := range cities {
c := colly.NewCollector(
colly.MaxDepth(1),
colly.Async(true),
)
extensions.RandomUserAgent(c)
c.Limit(&colly.LimitRule{DomainGlob: "*", Parallelism: 20})
spider(c, db, City)
baseURL := ThroughProxy(City)
c.Visit(baseURL.String())
c.Wait()
}
}
on this example i have set Parallelism: 20
but because i am creating a new golly instance for each record and because maxDepth is 1 its now working in parallel mode . is there any way to run this crawler in parallel and get the results faster ?
note : if i bring this part out of loop :
c := colly.NewCollector(
colly.MaxDepth(1),
colly.Async(true),
)
my max depth is not 1 for each website . its 1 for all websites .