I have an application that has so far been in English only. Content encoding throughout templates and database has been UTF-8. I am now looking to internationalize/translate the application into languages that have character sets absolutely needing UTF-8.
The application uses various PHP string functions such as strlen()
, strpos()
, substr()
, etc, and my understanding is that I should switch these for multi-byte string functions such as mb_strlen()
, mb_strlen()
, mb_substr()
, etc, in order for multi-byte characters to be handled correctly. I've tried to read around this topic a little but virtually everything I can find goes deep into "encoding theory" and doesn't provide a simple answer to the question: If I'm using UTF-8 throughout, can I switch from using strlen()
to mb_strlen()
and expect things to work normally in for example both English and Arabic, or is there something else I still need to look out for?
Any insight would be welcome, and apologies if I'm offending someone who has encoding close to their heart with my relative ignorance.