Questions tagged [go-colly]

colly is a web scraping framework written in Go. Import it as https://github.com/gocolly/colly. You will typically use this tag together with the main tag [go].

63 questions
2
votes
1 answer

Scraping a simple website with colly in golang does not return any data

I'm trying to scrape a simple website that looks like this:
    "Name Surname 1
    Name Surname 2
    Name Surname 3
    Name Surname 4"
  
Wrote a simple go code: package main import…
Kin Lu
  • 53
  • 4
2
votes
1 answer

Gocolly scraping only certain links

While scraping this link enter link description here , i just want to scrape library links, but the code I wrote extracts all the links, I couldn't manage to filter it. (I'm parsing the urls for later use in github…
2
votes
1 answer

Colly difference between Request.Visit and collector.Visit

I have written a colly script to collect port authority information from a site. func main() { // Temp Variables var tcountry, tport string // Colly collector c := colly.NewCollector() //Ignore the robot.txt …
CaptV89
  • 61
  • 1
  • 5
2
votes
0 answers

How to scrape an unordered list with go-colly?

I am trying to build a personal scraper of food recipes. I am able to get all other elements but food ingredients that are in unordered list. Here is a snippet of the page html: pagehtml My code so far that doesn't find strong element but prints…
M2R10
  • 23
  • 5
2
votes
1 answer

How to make gocolly crawl slower

I am using gocolly for harvesting data from my website, the challenge is, gocolly is too aggressive when crawling the URLs. I have added a RandomDelay Update Based on the answer I changed c.Limit(&colly.LimitRule{ RandomDelay: 10 *…
kristian nissen
  • 2,809
  • 5
  • 44
  • 68
2
votes
1 answer

Unable to Select an option from the dropdown for web scraping using gocolly\colly

I want to scrape data from the below public website using Golang gocolly/colly - https://eds.ospi.k12.wa.us/BusDepreciation/default.aspx?pageName=busSearch For the above website, I want to select all the "School District" options available in the…
Rahul Satal
  • 2,107
  • 3
  • 32
  • 53
2
votes
1 answer

Golang concurrent R/W to database

I'm writing some Go software that is responsible for downloading and parsing a large number of JSON files and writing that parsed data to a sqlite database. My current design has 10 go routines simultaneously downloading/parsing these JSONs and…
1
vote
0 answers

Why does using async mode/queue when parsing with gocolly yield incosistent results?

package main import ( "fmt" "strings" "sync/atomic" "time" "github.com/gocolly/colly/v2" "github.com/gocolly/colly/v2/queue" ) func main() { c := colly.NewCollector( ) c.SetRequestTimeout(time.Minute * 5) …
Don Draper
  • 463
  • 7
  • 21
1
vote
1 answer

is it possible crawl CSR website using gocolly

Is it possible to crawl CSR(Client Side Render/JS) websites using gocolly? I need to crawl many websites, and for that, I have a titleXpath in the database as follows: c.OnXML(titleXpath, func(e *colly.XMLElement) { data = append(data, e.Text) …
1
vote
1 answer

how to run go colly in parallel mode with depth of 1 and multiple links

i have a go colly project that i use to crawl multiple links that i fetch from a table like below : func main() { //db, err := sql.Open("postgres", "postgresql://postgres:postgres@localhost:5432/db?sslmode=disable") dbutil.Init() defer…
Farshad
  • 1,830
  • 6
  • 38
  • 70
1
vote
0 answers

Scraper Golang how to go to another page by URLs in the struct

I'm doing a golang scraper to get information from this site https://www.allrecipes.com/recipes/17562/dinner/ I want to get : Name, URL, Descriptions, Ingredients, Photos, Directions. How can I use the links in the struct products URL to send the…
maka
  • 39
  • 7
1
vote
1 answer

Scrapper colly in headless mode?

Scrapper colly in headless mode? Hello, I am new on golang and I have to make a scraper for my school in France. The site I have to scrape is www.allrecipes.com. On this site, I chose this page https://www.allrecipes.com/recipes/17562/dinner/ On…
maka
  • 39
  • 7
1
vote
1 answer

Colly - How to get the value of a child attribute?

Here is the sample page I been working on https://www.lazada.vn/-i1701980654-s7563711492.html Here is the element I want to get (the product title) ...
Chau Loi
  • 1,106
  • 1
  • 14
  • 36
1
vote
1 answer

Golang colly crawling error Too Many Requests

I'm trying to scrape some information from Google Trends. But every time that I try to get some data I receive the error Too Many Requests. Other sites are ok. My code: func Teste(searchTrend string) { searchTrend = strings.Trim(searchTrend, "…
1
vote
0 answers

Does the delay parameter in gocolly delay the website visit or the response?

When is the random delay in the colly limiter taking place? Based on the example code from: http://go-colly.org/docs/examples/random_delay/ I wrote the following: func main() { url := "https://httpbin.org/delay/2" // Instantiate default…
Rikku
  • 39
  • 3