Problem with decoding utf8 characters - šđžčć

Question

I have a word which contains some of these characters - šđžčć. When I take the first letter out of that word, I'll have a byte, when I convert that byte into string I'll get incorrectly decoded string. Can someone help me figure out how to decode properly the extracter letter. This is example code:

package main

import (
    "fmt"
)

func main() {
    word := "ŠKOLA"
    c := word[0]

    fmt.Println(word, string(c)) // ŠKOLA Å
}

https://play.golang.org/p/6T2FX4vN3-U

When you take the first letter, you have a rune, not a byte. — Jonathan Hall, Jun 25 '18 at 16:46

JimB · Accepted Answer · 2018-06-25T16:56:47.247

5

Š is more than one byte. One method to index runes is to convert the string to []rune

c := []rune(word)[0]

https://play.golang.org/p/NBUopxe-ik1

You can also use the functions provided in the utf8 package, like utf8.DecodeRune and utf8.DecodeRuneInString to iterate over the individual codepoints in the utf8 string.

r, _ := utf8.DecodeRuneInString(word)
fmt.Println(word, string(r))

edited Jun 25 '18 at 16:56

answered Jun 25 '18 at 16:46

JimB

104,193
13
262
255

Any comments on the `[]rune` conversion vs. [`utf8/DecodeRuneInString`](https://godoc.org/unicode/utf8#DecodeRuneInString)? – Jun 25 '18 at 16:49
2

OP might find [Strings, bytes, runes and characters in Go](https://blog.golang.org/strings) useful. – twotwotwo Jun 25 '18 at 16:50
@TimCooper: the question didn't really describe the intent, but the `utf8` functions are also options I can add ;) – JimB Jun 25 '18 at 16:52
Thanks, a perfect answer – Alen Jun 25 '18 at 16:59

Problem with decoding utf8 characters - šđžčć

1 Answers1