1

I'm trying to use Go and Colly to scrape a few details about some listings on Zillow. Here's the script I'm using:

package main

import (
    "encoding/csv"
    "log"
    "os"
    "time"

    "github.com/gocolly/colly"
    "github.com/gocolly/colly/proxy"
)

func main() {
    // filename for data
    fName := "data.csv"
    // create a file
    file, err := os.Create(fName)
    // check for errors
    if err != nil {
        log.Fatalf("Could not create file, error : %q", err)
        return
    }
    // close file afterwards
    defer file.Close()

    // instantiate a csv writer
    writer := csv.NewWriter(file)
    // flush contents afterwards
    defer writer.Flush()

    // instantiate a collector
    c := colly.NewCollector(
        colly.AllowedDomains("https://www.zillow.com/austerlitz-ny/sold/"),
    )

    // point to the webpage structure you need to fetch
    c.OnHTML(".list-card-info", func(e *colly.HTMLElement) {
        // write the desired data into csv
        writer.Write([]string{
            e.ChildText("h1"),
            e.ChildText("a"),
        })
    })

    // show completion
    log.Printf("Scraping Finished\n")
    log.Println(c)
}

The script seems to run with no errors, but also collects no data. Terminal records it as "Requests made: 0 (0 responses) | Callbacks: OnRequest: 0, OnHTML: 1, OnResponse: 0, OnError: 0" and the data.csv is empty as well.

Any idea on why this is happening and how to resolve it?

Hugo Smith
  • 11
  • 1

1 Answers1

0

You should read colly example first. Bellow is a demo example. Only when using c.Visit, the colly start request and get data for parse.

func main() {
    c := colly.NewCollector()

    // Find and visit all links
    c.OnHTML("a", func(e *colly.HTMLElement) {
        e.Request.Visit(e.Attr("href"))
    })

    c.OnRequest(func(r *colly.Request) {
        fmt.Println("Visiting", r.URL)
    })

    c.Visit("http://go-colly.org/") // start get data and the OnHTML start parse data get href
}
HuDahai
  • 47
  • 4
  • Hmm, this seems to work on another site however. Is it possible there's something unique about how this site is interfacing with Colly? I'm not getting any error codes which is making it harder to debug. – Hugo Smith May 24 '22 at 21:13