You may want to use strings.ToValidUTF8()
for this:
ToValidUTF8 returns a copy of the string s with each run of invalid UTF-8 byte sequences replaced by the replacement string, which may be empty.
It "seemingly" does exactly what you need. Testing it:
a := []byte{'a', 0xff, 0xaf, 'b', 0xbf}
s := strings.ToValidUTF8(string(a), ".")
fmt.Println(s)
Output (try it on the Go Playground):
a.b.
I wrote "seemingly" because as you can see, there's a single dot between a
and b
: because there may be 2 bytes, but a single invalid sequence.
Note that you may avoid the []byte
=> string
conversion, because there's a bytes.ToValidUTF8()
equivalent that operates on and returns a []byte
:
a := []byte{'a', 0xff, 0xaf, 'b', 0xbf}
a = bytes.ToValidUTF8(a, []byte{'.'})
fmt.Println(string(a))
Output will be the same. Try this one on the Go Playground.
If it bothers you that multiple (invalid sequence) bytes may be shrinked into a single dot, read on.
Also note that to inspect arbitrary byte slices that may or may not contain texts, you may simply use hex.Dump()
which generates an output like this:
a := []byte{'a', 0xff, 0xaf, 'b', 0xbf}
fmt.Println(hex.Dump(a))
Output:
00000000 61 ff af 62 bf |a..b.|
There's your expected output a..b.
with other (useful) data like the hex offset and hex representation of bytes.
To get a "better" picture of the output, try it with a little longer input:
a = []byte{'a', 0xff, 0xaf, 'b', 0xbf, 50: 0xff}
fmt.Println(hex.Dump(a))
00000000 61 ff af 62 bf 00 00 00 00 00 00 00 00 00 00 00 |a..b............|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 ff |...|
Try it on the Go Playground.