Questions tagged [go-colly]

colly is a web scraping framework written in Go. Import it as https://github.com/gocolly/colly. You will typically use this tag together with the main tag [go].

63 questions
0
votes
1 answer

How to add the start of a url to a colly link list

I'm somewhat new to go and am trying to scrape several webpages using colly. Two of the pages have incomplete links, the below is the code and output func PaloNet() { c := colly.NewCollector( …
0
votes
0 answers

Why is string not written in destination file using go colly?

I have a web scraper and I need to write a string from HTML code to my CSV file. The HTML code looks like this:
Bucuresti - Ilfov, Bucuresti, 
…</div>
        <div class=
DvdiidI
  • 63
  • 6
0
votes
1 answer

How can I get with go colly some text that is placed inside a div?

I have a web scraper and I'm trying to get some text and write it in a CSV file. The HTML structure is: I have a div with class="css-1nrl4q4"; inside this div I have another div without class, and inside this second div I have two p elements that…
DvdiidI
  • 63
  • 6
0
votes
1 answer

Scrape discription from web site go-colly

I try scrape the description from website img, but I not understand how to get there My trying pg := Program{} slPG := []Program{} c.OnHTML(".short", func(e *colly.HTMLElement) { pg.Name = e.ChildText("h2.short-cat") pg.Link =…
Eno Ron
  • 11
  • 2
0
votes
1 answer

Iterate over HTMLElement attributes with colly?

As seen in the HTML struct, the attributes is a private property: // HTMLElement is the representation of a HTML tag. type HTMLElement struct { // Name is the name of the tag Name string Text string attributes…
danthegoodman
  • 501
  • 4
  • 10
0
votes
1 answer

Running Colly web scraper periodically using cron in Go

I was doing some web scraping using colly but wanted to run it periodically using cron. I did try out a basic approach to it. type scraper struct { coll *colly.Collector rc *redis.Client } func newScraper(c *colly.Collector, rc…
0
votes
1 answer

Golang Colly Scraping - Website Captcha Catches My Scrape

I did make Scraping for Amazon Product Titles but Amazon captcha catches my scraper. I tried 10 times- go run main.go(8 times catches me - 2 times I scraped the product title) I researched this but I did not find any solution for golang(there is…
Melisa
  • 310
  • 2
  • 16
0
votes
1 answer

Retry request in go-colly

I have this scraper library, I would like to change my user agent if the first user agent returns error, but this code doesnt work, if first user agent doesnt work, I have send the 2nd attempt but this will never finish since onHTML is not…
nanakondor
  • 615
  • 11
  • 25
0
votes
1 answer

Colly not finding the body tag by xpath but finding it by selector name

I'm learning web scraping using gocolly. When I try to find the tag using selector name body, it successfully finds it. However, when I try to find the body tag by xpath /html/body, it fails to find it. I have used OnHTML() with a simple callback…
kkin
  • 33
  • 2
  • 6
0
votes
0 answers

Why is Go Colly Collector not always finding SVG tag?

I am trying to write a simple web scraper in Go using Colly. The program is supposed to visit an earnings calendar for a particular date range on yahoo finance and then spiral out and visit each Stock Ticker page that shows up in the list. The…
0
votes
1 answer

problems with noscript when scraping using go-colly

so I'm making a scraping script from a website. when scraping text is successful, only when scraping the image fails. When I inspect element the code is still normal, but when I run the view source the image wrapping code changes to noscript. So I…
0
votes
1 answer

Colly Max Depth and encoding/json - null

I have gone through the Go tour and I'm now going through some of the Colly tutorials. I understand the max depth and have been trying to implement it in a go program like so: package main import ( "encoding/json" "log" "net/http" …
majordomo
  • 1,160
  • 1
  • 15
  • 34
0
votes
1 answer

Scrape ONLY a certain

I'm trying to make a web scraper using gocolly. I want to ONLY scrape a
element with the id of dailyText on https://wol.jw.org/en/wol/h/r1/lp-e. How can I do this?
altude
  • 41
  • 6
0
votes
1 answer

How to bypass re-captcha with gocolly twocaptcha and selenium

After several request my scraping code blocked by target site with re-captcha. I use https://github.com/gocolly/twocaptcha to bypass captcha with selenium chrome driver. It works while bypass with selenium chrome driver but when I run my scraping…
dikutandi
  • 127
  • 1
  • 6
0
votes
1 answer

How to hook go-colly to elasticsearch?

What change do I make in below code to index in elastic using go-colly? I want to get full text (strip html, strip js, render if needed), then Conform it to an avro schema {pageurl: , title:, content:}, Bulk-post to specific elastic-search…
Espresso
  • 5,378
  • 4
  • 35
  • 66