0

so I'm making a scraping script from a website. when scraping text is successful, only when scraping the image fails. When I inspect element the code is still normal, but when I run the view source the image wrapping code changes to noscript. So I thought that was the case, maybe someone can help?

c.OnHTML(".postarea", func(h *colly.HTMLElement) {
        as := Image{}
        as.Name = h.ChildText(".headpost .entry-title")
        h.ForEach(".maincontent", func(i int, x *colly.HTMLElement) {
            ya := So{}
            ya.Url = x.ChildAttr("#readerarea img", "src")
            as.Image = append(as.Image, ya)
        })
        b, err := json.MarshalIndent(as, "", " ")
        if err != nil {
            log.Println("failed to serialize response:", err)
            return
        }
        w.Header().Add("Content-Type", "application/json")
        w.Write(b)
    })
    c.OnRequest(

and this is the sample html code.

<div id="readerarea"><noscript>
        <p><img loading="lazy"
                src="#" alt=""
                width="725" height="1024" class="alignnone size-full wp-image-72251" /><img loading="lazy"
                src="#" alt=""
                width="725" height="1024" class="alignnone size-full wp-image-72251" /><img loading="lazy"
                src="#" alt=""
                width="725" height="1024" class="alignnone size-full wp-image-72251" /><img loading="lazy"
                src="#" alt=""
                width="725" height="1024" class="alignnone size-full wp-image-72251" />
        </p>
    </noscript>
</div>

1 Answers1

0

There will be some JavaScript on the page that updates this (the <noscript> section is for browsers without JavaScript). When you 'view source' you are seeing the raw HTML as delivered by the server; with 'Inspect Element' you see the DOM as it stands (i.e. after whatever script that updates this section has run).

Go-Colly does not run JavaScript so you will need another approach. Options include looking at the JavaScript to see how it locates the images or using something like chromedp instead of go-colly.

Brits
  • 14,829
  • 2
  • 18
  • 31