1

I am learning "Go" for web crawling. I would like to take some text from following site: "https://edition.cnn.com/markets/fear-and-greed"

This site need waiting time to load all html text. So I have used chromedp to get the text from this site.

However, when I run this script, there is no response. The code is

package main

import (
"context"
"log"
"strings"
"github.com/chromedp/chromedp"
)

func main() {

opts := append(chromedp.DefaultExecAllocatorOptions[:],
    chromedp.Flag("headless", false),
)

ctx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)

defer cancel()

ctx, cancel = chromedp.NewContext(ctx)
defer cancel()

var res string
err := chromedp.Run(ctx,
    chromedp.Navigate("https://edition.cnn.com/markets/fear-and-greed"),
    chromedp.Text(".market-fng-gauge__dial-number-value", &res, chromedp.NodeVisible),
)
if err != nil {
    log.Fatal(err)
}

log.Println(strings.TrimSpace(res))

}

What is Wrong? I really want to scrap this site using "Go". Please let me know how to do.

puraria
  • 11
  • 2

2 Answers2

0

I am not sure why you are not getting a result, but it seems like chromedp is a little too involved for your task. You might prefer looking at https://github.com/antchfx/htmlquery, which is a much simpler package for finding various elements inside of an HTML document.

  • The reason I use chromdp is the site "https://edition.cnn.com/markets/fear-and-greed" need wait time to read all the html. In this reason, I want to use selenium type web crawling package. I have tried your recommendation. However it does not work. This seems not support dynamic html parsing. Can you show me the code to get the text from the node I want? – puraria Aug 27 '22 at 07:20
0

Change your code like this:

diff --git a/main.go b/main.go
index dbc75b3..51521a8 100644
--- a/main.go
+++ b/main.go
@@ -23,7 +23,7 @@ func main() {
    var res string
    err := chromedp.Run(ctx,
        chromedp.Navigate("https://edition.cnn.com/markets/fear-and-greed"),
-       chromedp.Text(".market-fng-gauge__dial-number-value", &res, chromedp.NodeVisible),
+       chromedp.Text(".market-fng-gauge__dial-number-value", &res, chromedp.ByQuery, chromedp.NodeVisible),
    )
    if err != nil {
        log.Fatal(err)

Explanation

Please note that, by default, the chromedp.Query action uses the chromedp.BySearch option, which wraps DOM.performSearch. It returns all the results matched by a plain text, a css selector, or an XPath.

So you should specify chromedp.ByQuery to make sure it returns the node you want.

See here for more information: https://github.com/chromedp/chromedp/issues/936#issuecomment-951480271

Zeke Lu
  • 6,349
  • 1
  • 17
  • 23