0

I'm reading a Unicode stream and would rather not have to pass the entire string through a regex. Is there a simple (reliable) character I can use to break words across languages?

My byte array is likely going to be based in UTF-16 or UTF-8

makerofthings7
  • 60,103
  • 53
  • 215
  • 448

1 Answers1

0

If you are using Java then you can use the BreakIterator.

Aravind Yarram
  • 78,777
  • 46
  • 231
  • 327