1

I'm trying to come up with a test suite which checks HTML fragments/files are canonically equivalent to one another. I was surprised to see that if I parse the same string or file, the https://godoc.org/golang.org/x/net/html#Node was comparing as different. What am I missing?

Hopefully this demonstrates the issue:

package main

import (
    "fmt"
    "strings"

    "golang.org/x/net/html"
)

func main() {
    s := `<h1>
    test
    </h1><p>foo</p>`
    // s2 := `<h1>test</h1><p>foo</p>`

    doc, _ := html.Parse(strings.NewReader(s))
    doc2, _ := html.Parse(strings.NewReader(s))

    if doc == doc2 {
        fmt.Println("HTML is the same") // Expected this
    } else {
        fmt.Println("HTML is not the same") // Got this
    }
}

HTML is not the same

hendry
  • 9,725
  • 18
  • 81
  • 139

1 Answers1

0

The simplest way is to use reflection, since html.Parse returns *Node object. Comparing two of objects in Go need a reflect.DeepEqual.

if reflect.DeepEqual(doc, doc2) {                                     
        fmt.Println("HTML is the same") 
} else {
        fmt.Println("HTML is not the same")                         
}

This prints out "HTML is the same".

Pandemonium
  • 7,724
  • 3
  • 32
  • 51
  • Is there is a space, it says it's not the same. :/ http://s.natalian.org/2016-05-29/so.go – hendry May 28 '16 at 23:59
  • I ended up doing a simpler strings.Contains based test https://github.com/kaihendry/toc/blob/master/toc_test.go – hendry Jan 20 '17 at 04:20