3

I'm trying to parse a string into a regular JSON struct in golang. I don't control the original string, but it might contain unwanted characters like this

originalstring := `{"os": "\u001C09:@>A>DB Windows 8.1 \u001A>@?>@0B82=0O"}`
input := []byte(originalstring)
var event JsonStruct
parsingError := json.Unmarshal(input, &event)

If I try to parse this into golang I get this error

 invalid character '\x1c' in string literal

I previously had a way to do this in Java by doing this

event = charset.decode(charset.encode(event)).toString();
eventJSON = new JsonObject(event);

Any idea?

avillagomez
  • 443
  • 1
  • 8
  • 18
  • 1
    here you go http://play.golang.org/p/DK8gJgF8JU – avillagomez Feb 18 '16 at 23:03
  • The ```\u``` disappears once you print it, but it causes problems when you try to parse JSON with one of those characters in it – avillagomez Feb 18 '16 at 23:05
  • 1
    You may have to sanitize the input. This might help: http://stackoverflow.com/questions/20401873/remove-invalid-utf-8-characters-from-a-string-go-lang – JimB Feb 18 '16 at 23:18

2 Answers2

3

You need to convert control characters to unicode code points in notation \xYYYY where Y is hexadecimal digit. A working example of that is:

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "unicode"
)

func convert(input string) string {
    var buf bytes.Buffer
    for _, r := range input {
        if unicode.IsControl(r) {
            fmt.Fprintf(&buf, "\\u%04X", r)
        } else {
            fmt.Fprintf(&buf, "%c", r)
        }
    }
    return buf.String()
}

func main() {    
    input := convert(`{"os": "09:@>A>DB Windows 8.1 >@?>@0B82=0O"}`)
    fmt.Println(input)
    js := []byte(input)

    t := struct {
        OS string
    }{}

    err := json.Unmarshal(js, &t)
    fmt.Println("error:", err)
    fmt.Println(t)
}

Which produces:

{"os": "09:@>A>DB Windows 8.1 \u001A>@?>@0B82=0O"}
error: <nil>
{09:@>A>DB Windows 8.1 >@?>@0B82=0O}
tumdum
  • 1,981
  • 14
  • 19
2

According to the Ecmascript standard for JSON strings, control characters must be escaped in order to be valid JSON. If you want to preserve your control characters you'll have to turn them into valid escape strings, or if you don't want to preserve them then you'll have to remove them before Unmarshaling.

Here is an implementation of the latter:

func stripCtlFromUTF8(str string) string {
    return strings.Map(func(r rune) rune {
        if r >= 32 && r != 127 {
            return r
        }
        return -1
    }, str)
}

func main() {

    js := []byte(stripCtlFromUTF8(`{"os": "09:@>A>DB Windows 8.1 >@?>@0B82=0O"}`))

    t := struct {
        OS string
    }{}

    err := json.Unmarshal(js, &t)
    fmt.Println("error:", err)
    fmt.Println(t)
}

On the playground: http://play.golang.org/p/QRtkS8LF1z

Daniel Kamil Kozar
  • 18,476
  • 5
  • 50
  • 64
Zikes
  • 5,888
  • 1
  • 30
  • 44