we are working on a project, where we have to imitate some export output of an old legacy system.
These exports are text based and encoded in the WINDOWS-1252 encoding, where special characters should be encoded in their decimal/numeric representation, e.g. α
should be α
.
I tried to use htmlspecialchars
, htmlentities
and mb_convert_encoding
- unfortunately with no luck.
Currently I'm iterating over each character of a string and check if it's an ASCII character or not. If the character is not valid ASCII, I'm transforming it to it's decimal representation using mb_ord
, see my function:
private function transformString(string $str)
{
if (mb_check_encoding($str, 'ASCII') === true) {
return $str;
} else {
$characters = preg_split('//u', $str, -1, PREG_SPLIT_NO_EMPTY);
$transformedString = '';
foreach ($characters as $character) {
if (mb_check_encoding($character, 'ASCII') === false) {
$character = sprintf('&#%s;', mb_ord($character));
}
$transformedString .= $character;
}
return $transformedString;
}
}
This solution seems to work, but I'm curious if there is a cleaner way for this transformation?
Thanks in advance!