1
<speak><voice name=\"en-US-JennyNeural\"><prosody rate=\"1\">aaaaaaaa<break time=\"5s\"/> bbbb. <br time=\"2s\"/>ccccccdddddddd </prosody></voice></speak>

I use this code to parse and get:

    doc, err := goquery.NewDocumentFromReader(strings.NewReader(text))
    if err != nil {
        return "", err
    }
    ssml, err := doc.Find("html body").Html()
    if err != nil {
        return "", err
    }

Result:

<speak><voice name="en-US-JennyNeural"><prosody rate="1">aaaaaaaa<break time="5s"> bbbb. <br time="2s"/>ccccccdddddddd </break></prosody></voice></speak>

I think the break doesn't parse Correctly. I want to parse <break/> like <br/>.

icza
  • 389,944
  • 63
  • 907
  • 827

2 Answers2

0

Assuming you're using github.com/PuerkitoBio/goquery, it uses golang.org/x/net/html under the hood for HTML parsing, which is an HTML5-compliant tokenizer and parser.

<br> and <break> are parsed differently because in HTML <br> is a tag that must not have a closing tag, but <break> is not such a tag.

If you want goquery to handle your HTML properly, you must use an explicit closing tag for <break> instead of the self-closing tag.

E.g. instead of this:

<break time="5s"/>

You must use this:

<break time="5s"></break>

With this change your output will be (try it on the Go Playground):

<speak><voice name="en-US-JennyNeural"><prosody rate="1">aaaaaaaa<break time="5s"></break> bbbb. <br time="2s"/>ccccccdddddddd </prosody></voice></speak>
icza
  • 389,944
  • 63
  • 907
  • 827
0
    d := xml.NewDecoder(strings.NewReader(text))
    var writer bytes.Buffer
    e := xml.NewEncoder(&writer)
    for {
        t, err := d.Token()
        if err == io.EOF {
            break
        }
        if err != nil {
            return "", err
        }
        if t == nil {
            break
        }
        switch se := t.(type) {
        case xml.StartElement:
            e.EncodeToken(se)

        case xml.EndElement, xml.CharData, xml.Comment, xml.ProcInst, xml.Directive:
            e.EncodeToken(se)
        }
    }
    e.Flush()

    return writer.String(), nil