0

I'm a beginner with golang, writing an XML parser.

My goal is that would like to include checks for whether the xml file is formatted correctly, checking for missing brackets or misspelled words for elements and attributes. If there are missing brackets or misspelled words, the code could throw an exception informing users to correct the mistake.

Let's take a concrete example of an xml file, example.xml:

<?xml version="1.0" encoding="utf-8"?>

<servers version="1">
    <server>
        <model name="Cisco" type="modelA"></model>
        <serverName>Tokyo_VPN</serverName>
        <serverIP>127.0.0.1</serverIP>
    </server>
    <server>
        <model name="Dell" type="modelB"></model>
        <serverName>Moscow_VPN</serverName>
        <serverIP>127.0.0.2</serverIP>
    </server>
</servers>

Using the standard Go package "encoding/xml", it's straightforward to define structures and parse the XML as follows:

package main

import (
    "encoding/xml"
    "fmt"
    "io/ioutil"
    "os"
)

type Servers struct {
    XMLName     xml.Name `xml:"servers"`
    Version     string   `xml:"version,attr"`
    Svs         []server `xml:"server"`
}

type server struct {
    XMLName    xml.Name `xml:"server"`
    Model      model    `xml:"model"`
    ServerName string   `xml:"serverName"`
    ServerIP   string   `xml:"serverIP"`
}

type model struct {
    XMLName    xml.Name   `xml:"model"` 
    Name       string     `xml:"name,attr"`
    Type       string     `xml:"type,attr"`  
}


func main() {

    // open the xml file
    file, err := os.Open("toy.xml")  
    if err != nil {
        fmt.Printf("error: %v", err)
        return
    }
    defer file.Close()

    // read the opened xmlFile as a byte array.
    byteValue, _ := ioutil.ReadAll(file)

    var allservers Servers

    err = xml.Unmarshal(byteValue, &allservers)
    if err != nil {
        fmt.Printf("error: %v", err)
        return
    }

    fmt.Println(allservers)
}

Mistakes such as missing brackets i.e.

<model name="Cisco" type="modelA"></model

or misspelled attributes/elements, e.g.

<serverNammme>Moscow_VPN</serverName>

, these errors are caught via XML syntax errors.

There are other errors which could occur though. For example, misspelled words for the attributes:

<model namMMe="Cisco" typeE="modelA"></model>

Although this is valid XML format, I would like to catch this as an error, as (for my purposes) these are spelling mistakes in the input XML file which should be corrected.

This will be parsed without any errors to be the following:

{{ servers} 1 [{{ server} {{ model}  } Tokyo_VPN 127.0.0.1} {{ server} {{ model} Dell modelB} Moscow_VPN 127.0.0.2}]}

How could I catch these errors and throw an error?

EB2127
  • 1,788
  • 3
  • 22
  • 43
  • "Is there a standard way to check for these errors?" No. "Is it possible to write these sorts of checks when unmarshalling the xml?" It's *impossible* to *not* write these checks. They are a natural feature of the tokenizer/lexer for syntactic errors, such as missing brackets or quotes; and of the parser for semantic errors such as mismatched tag names. Your last example is valid XML and will not produce an error. The XML document may not conform to some expected schema, but that has nothing to do with parsing. Go does not have exceptions. – Peter Sep 03 '20 at 06:03
  • @Peter "Your last example is valid XML and will not produce an error." Right. However, while it may be valid XML, it is an invalid file format for my purposes. Therefore, I would like to catch this mistake, and throw an error. My "exceptions", I mean throw an error---I have edited to make this clear. "It's impossible to not write these checks. They are a natural feature of the tokenizer/lexer for syntactic errors, such as missing brackets or quotes; and of the parser for semantic errors such as mismatched tag names" Yes, I've tried to edit the question to avoid this confusion. – EB2127 Sep 03 '20 at 14:57
  • @Peter Hopefully it makes sense what I'm asking now; I think there was some confusion---I'm happy to edit the question further. – EB2127 Sep 03 '20 at 15:14

1 Answers1

0

If you go to the documentation of encoding/xml

https://golang.org/pkg/encoding/xml/#Unmarshal

There is a example for writing custom Marshal/Unmarshal, You just need to implement the Unmarshaler Interface

So your custom Unmarshaler can check for values while it un-marshales and return errors

Shubham Srivastava
  • 1,807
  • 1
  • 10
  • 17