Keep in mind that ISO-8859-1
only supports a tiny subset of characters compared to Unicode. If you know for certain that your UTF-8
encoded string only contains characters covered by ISO-8859-1
, you can use the following code.
package main
import (
"fmt"
"golang.org/x/text/encoding/charmap"
)
func main() {
str := "Räv"
encoder := charmap.ISO8859_1.NewEncoder()
out, err := encoder.Bytes([]byte(str))
if err != nil {
panic(err)
}
fmt.Printf("%x\n", out)
}
The above prints:
52e476
So 0x52
, 0xE4
, 0x76
, which looks correct as per https://en.wikipedia.org/wiki/ISO/IEC_8859-1 - in particular the second character is of note, since it would be encoded as 0xC3
, 0xA4
in UTF-8
.
If the string contains characters that aren't supported, e.g. we change str
to be "Rävv"
, then an error is going to be returned by encoder.Bytes([]byte(str))
:
panic: encoding: rune not supported by encoding.
goroutine 1 [running]:
main.main()
/Users/nj/Dev/scratch/main.go:15 +0x109
If you wish to address that by accepting loss of unconvertible characters, a simple solution might be to leverage EncodeRune
, which returns a boolean to indicate if the rune is in the charmap's repertoire.
package main
import (
"fmt"
"golang.org/x/text/encoding/charmap"
)
func main() {
str := "Rävv"
out := make([]byte, 0)
for _, r := range str {
if e, ok := charmap.ISO8859_1.EncodeRune(r); ok {
out = append(out, e)
}
}
fmt.Printf("%x\n", out)
}
The above prints
52e47676
i.e. the emoji has been stripped.