2

Go has the unicode package, containing useful functions such as IsGraphic or IsPrint. One function that is missing though is IsAssigned. Of course I could write my own function by using the other functions. But I would rather expect the standard library to provide this function. In Java, writing this function is easy:

boolean isAssigned(int codePoint) {
    return Character.getType(codePoint) != Character.UNASSIGNED;
}

In Go there is no function unicode.Type(rune) or unicode.IsAssigned(rune). The closest I could find is this:

func IsAssigned(r rune) bool {
    return unicode.IsControl(r) ||
            unicode.IsGraphic(r) ||
            unicode.IsSymbol(r)
}

But that code thinks that U+00AD (soft-hyphen) is unassigned, which is wrong.

How can I get correct information about unassigned code points?

Roland Illig
  • 40,703
  • 10
  • 88
  • 121

1 Answers1

2

I think you can verify a code point is assigned or not using unicode.Is and unicode.Categories (though it is not efficient), i.e.

func IsAssigned(r rune) bool {
    for _, v := range unicode.Categories {
        if unicode.Is(v, r) {
            return true
        }
    }
    return false
}

Working example is in The Go Playground.

putu
  • 6,218
  • 1
  • 21
  • 30
  • are you sure that all assigned characters belong to some category? (it is kind of obvious, but maybe you have a quick proof) – noonex May 13 '19 at 15:32
  • actually code below shows 260+K runes belong to categories, while there is only ~130K codepoints assigned. Do I do something wrong? https://play.golang.org/p/gIcNVHa5PG6 – noonex May 13 '19 at 15:57