4

I'm having trouble with something in Go and I'm not sure where to look. I'm fetching a UTF-8 string from a MySQL database, and attempting to return it in a JSON response to a client.

Different clients react differently, but iOS NSJSONSerialization returns an "Unescaped control character" error. This breaks the whole application. I can decode the JSON without issue in Chrome using JSON.parse(), though.

On the server-side, this same generator function written in another language besides Go works fine. Help?


EDIT: Here is the JSON that is causing the issue:

{ "test":"☮️" }

... If I omit this emoji, it works. If it's there, it doesn't work. The issue seems to be something related to there being two different encodings for certain emoji. One seems to trip up Go, but they are both valid.

To demonstrate the difference in encoding, some of the emoji show up in the database explorer and some do not:

screenshot

... These ones that appear in the database explorer are causing this issue with 100% reproducibility. However, all of them usually appear in the actual client software (not the database explorer) without issue. I don't know if there's a way to reconfigure the database connection to avoid this (or something), but it seems to work with different instances depending on what is doing the decoding and how forgiving it is. Considering that users could type or copy/paste either encoding... this needs to work consistently.

Any help would be appreciated. Thanks in advance.

Ben Guild
  • 4,881
  • 7
  • 34
  • 60
  • 1
    It would be extremely helpful to include the JSON that is causing the error. –  May 31 '16 at 01:27
  • Posted. It's one of the emoji characters. Without it, it works fine. – Ben Guild May 31 '16 at 01:59
  • 2
    The value of `test` in your JSON is `\u262e\ufe0f`. The first character is `PEACE SYMBOL`; the second is `VARIATION SELECTOR-16`. I suspect the presence or absence of the latter is what causes the emoji to be displayed or not, and the error as well. Please, confirm. – noisypixy May 31 '16 at 02:36
  • I've only tried it with and without the emoji entirely, but I'm guessing you're probably right on the second character. Still, the problem I've got is that I cannot control the user input but for some reason Golang's translation of this is considered invalid by the client, yet the same script in PHP returns fine without any processing issue. (fetches from the same DB) – Ben Guild May 31 '16 at 04:50
  • It works fine for me. Maybe the "database explorer" is broken? – Rick James Jun 06 '16 at 05:26
  • I think it is, but the problem is solved below in the comments. – Ben Guild Jun 06 '16 at 05:39

1 Answers1

2

Go is doing fine.

fmt.Println([]byte("☮️"))
//[226 152 174 239 184 143]
//Yup, 1 character - 6 bytes.

NSJSONSerialization cant handle this. May be this link will be helpful NSJSONSerialization and Emoji. It's something about NSData * utf32Data = [uniText dataUsingEncoding:NSUTF32LittleEndianStringEncoding];. blah

Can you give us byte representation of "☮️" simbol in "iOS style", like i did with go?

UPD

I made some research, looks like something wrong with your database encoding. Is it UTF16?

Check this out

// it look the same, but completely different "characters"
//first one is yours, and second one is U+262E
const nihongo = "☮️☮"
for index, runeValue := range nihongo {
        fmt.Printf("%#U starts at byte position %d\n", runeValue, index)
}
bad := []byte("☮️")
good := []byte("☮")
fmt.Printf("%v %s \n", bad, bad)
fmt.Printf("%v %s \n", good, good)

Output:

U+262E '☮' starts at byte position 0
U+FE0F '️' starts at byte position 3
U+262E '☮' starts at byte position 6
[226 152 174 239 184 143] ☮️ 
[226 152 174] ☮ 

UDP2

It just hit me! I was doing ctrl+c/ctrl+v all the way with your symbol. But it is not a single symbol! Its 2 symbols and second one is unprintable.

unprintable := []byte{239, 184, 143}
fmt.Printf("valid? %v", utf8.Valid(unprintable))
fmt.Println("full rune?", utf8.FullRune(unprintable))
r, size := utf8.DecodeRune(unprintable)
fmt.Println(r, size, string(r))
fmt.Printf("valid rune? #v", utf8.ValidRune(r))

Output:

valid? true
full rune? true
65039 3 ️
valid rune? true

So, your db is fine, unprintable "character" is fine, but NSJSONSerialization can not handle it. Better to ask iOS community =)

Community
  • 1
  • 1
Darigaaz
  • 1,414
  • 10
  • 11
  • "character" is a somewhat vague term in this context; those six bytes are two codepoints (two runes in Go speak) that make up one visible symbol. – hobbs May 31 '16 at 02:44
  • 2
    OK, but blaming this whole thing on iOS isn't fair because if I code the same script in PHP with the same database and same iOS client, it doesn't have this issue. It's something with Golang because that's the different component. – Ben Guild May 31 '16 at 04:46
  • Also, just to clarify, the storage format in the database is `utf8mb4`. It returns the emoji fine via the script in PHP, as mentioned, but not Golang. (I prefer the Golang version so I hope I can resolve this, as I cannot control user input which could be any of these variations.) – Ben Guild May 31 '16 at 04:48
  • I gave you info how go deals with this characrers, you gave me "it works fine with PHP" reason. is it fair? give us what PHP read from database (PHP can clean strings internally), try to encode it with PHP and show output. try to DEcode ☮ ([226 152 174]) 1 character with NSJSONSerialization, try to ENcode the same character in iOS, try to ENcode unprintable character ([239, 184, 143]) in iOS. I think you got the point. – Darigaaz May 31 '16 at 12:29
  • Right, but my point isn't to start a language battle more to address the fact that something that Go is doing is causing a problem from a shared data source. Regardless of what Go is doing right or wrong, in my opinion Go is generating what is considered unsafe JSON considering that user input might have these characters in it. Would you not agree? Is there any way to protect against them on the server-side? – Ben Guild Jun 01 '16 at 17:17
  • I created a related question about perhaps band-aiding this before returning the JSON... https://stackoverflow.com/questions/37575168/how-to-remove-unescaped-characters-from-json-generated-by-go ... Could you post thoughts if you have any? – Ben Guild Jun 01 '16 at 17:26
  • 2
    Did some serious digging on this and found a really odd encoding error somewhere on the client-side. So you're right, it wasn't a Go issue! It was just brought to light by the fact that Go was, in fact, doing everything right. :) – Ben Guild Jun 02 '16 at 00:09
  • I'm glad it helped. Make sure to use `encoding:NSUTF8StringEncoding` or `dataUsingEncoding:NSUTF8StringEncoding` on your NSString/NSData. – Darigaaz Jun 02 '16 at 01:25
  • If you are using NSString: default NSString's encoding is UTF-16, may be its the root of problem. However NSData has no encoding at all, its simple bytes buffer. Cant say more with out client code. – Darigaaz Jun 02 '16 at 01:36