0

It's been a long couple of days and my heads getting a little fried. I haven't done very much binary mathematics since leaving university and I'm struggling to work this one out.

I've got a fairly locked down system based on PHP 5.6 that doesn't include the mbstring functions nor iconv. I've already got a function (from elsewhere) that converts from UTF-16 to UTF-8, but now I need the reverse.

The algorithm for an individual character seems fairly straightforward when I look at wikipedia, although I'm a little rusty on the exact procedure. I believe that bit-shifting will be necessary etc.

However, I want to do the conversion to an entire string. How can I determine when each character starts and ends?

Can some kind soul out there help me out? I imagine the function itself won't be that complicated to someone who knows what they're doing. I'm so out of practice that I'm getting myself tied up in knots.

  • The first few bits of the first byte of a UTF-8 encoded code point will tell you how many bytes it uses. Take a look at my answer [here](https://stackoverflow.com/a/45089061/3942918) for some details on that. Start with that, going from UTF-8 to code point, then encode the code point as UTF-16. There you'll need to use 2 bytes for everything up to code point 65536, then you're into needing to translate to surrogate pairs as your wikipedia link describes. – user3942918 Jul 29 '17 at 04:12
  • Thanks @Paul. I've abandoned this part of the project for now, but your information might come in handy if I'm asked to pick it up again or for anyone else that might have similar problems. – CyberneticianDave Aug 04 '17 at 08:50

0 Answers0