2

We have a mySQL DB that only supports utf8. But we are getting some data feeds that require utf8mb4 for storing in mySQL. How can we detect (in Java) if a string will require utf8mb4 charset?

Saqib Ali
  • 3,953
  • 10
  • 55
  • 100

1 Answers1

5

Characters that require utf8mb4 are represented as a surrogate pair in Java, and occupy 2 chars. A simple way to detect them is therefore checking if the length of the string in chars is the same as the number of code points:

boolean requiresMb4(String s) {
    int len = s.length();
    return len != s.codePointCount(0, len);
}
Joni
  • 108,737
  • 14
  • 143
  • 193