2

How do I convert CP-850 to UTF-8 using PHP?

What I've found and tried

Gist - pedrosancao/CharsetConversion.php

This seems like a fine solution, but since I'm working with a lot of text to change the encoding of, this is too slow. I got one process running for 40 minutes, changing the encoding of 1.5 megs of data.

dos2unix

I tried dos2unix -c iso -850 data.csv, which converts CP-850 to ASCII, but this didn't do the trick.

Timon de Groot
  • 7,255
  • 4
  • 23
  • 38

3 Answers3

0

Have you tried using iconv? $str = iconv("CP850","UTF-8", $str);

Optionally with transliteteration (//TRANSLIT appended to the second param).

A second suggestion would be to use recode_file or recode_string; http://php.net/manual/en/ref.recode.php

martijn
  • 80
  • 5
0

Use iconv:

iconv -f CP80 -t UTF-8 data.csv > data-utf8.csv

If you need to convert line breaks too you do:

iconv -f CP80 -t UTF-8 data.csv | dos2unix > data-utf8.csv

dos2unix -c iso -850 data.csv converts the file to ISO-8859-1 (Latin 1), on Windows known as CP1252.

0

I ended up using utf8_encode, should have tried that a bit earlier.

Timon de Groot
  • 7,255
  • 4
  • 23
  • 38