2

I have a String that could contain 4 bytes characters. For example:

String s = "\uD83D\uDC4D1234\uD83D\uDC4D";

I also have a size that I should use to get a substring from it. The size is in characters. So let's say that size is 5, so I should get the first 4 bytes character along with "1234".

Directly using substring as s.substring(0, 5) gives the wrong result returning the first character and just "123".

I could manage to get the right result using code points this way:

String s = "\uD83D\uDC4D1234\uD83D\uDC4D";
StringBuffer buf = new StringBuffer();
long size = 5;
s.codePoints().forEachOrdered(charInt -> {
    if(buf.codePoints().count() < size) {
        buf.appendCodePoint(charInt);
    }
});

I bet there should be a way better and more efficient code to achieve this.

Federico Pugnali
  • 655
  • 8
  • 18

1 Answers1

4

You can use offsetByCodePoints in order to help find the index of the character following 5 code points, and then use that as the second parameter to substring:

String s = "\uD83D\uDC4D1234\uD83D\uDC4D";
String sub = s.substring(0, s.offsetByCodePoints(0, 5));

Ideone Demo

4castle
  • 32,613
  • 11
  • 69
  • 106