I just published an open source event-based HTML 5.0 compliant parsing package for Go. You can find it here
Here is the sample code to get all the links from a page (from A elements):
links := make([]string)
parser := NewParser(htmlContent)
parser.Parse(nil, func(e *HtmlElement, isEmpty bool) {
if e.TagName == "link" {
link,_ := e.GetAttributeValue("href")
if(link != "") {
links = appends(links, link)
}
}
}, nil)
A few things to keep in mind:
- These are relative links, not full URLs
- Dynamically generated links will not be collected
- There are other links not being collected (META tags, images, iframes, etc.). It's pretty easy to modify this code to collect those.