0

I have been using iconv to convert data from my company's database (windows-1250) to UTF8. It all worked fine until recently.

I'm not really sure what happened, as I've noticed the change only recently. The problem is that iconv seems to stop working well - it still throws notices when I use bad encoding name.

Earlier, when I saved a string to the db with

htmlspecialchars(iconv('UTF-8', 'windows-1250', $string), ENT_QUOTES) it was fine. Now only question marks are written to my db instead of e.g. ąęś.

When I correct them via PL/SQL Developer and read them via php: htmlspecialchars_decode(iconv('windows-1250', 'UTF-8', $string), ENT_QUOTES)

I receive aes. I tried to set the encoding in php, right before string output:

header('Content-Type: text/html; charset=utf-8');, but it didn't help.

My software is:

  • PHP 5.3.15 (cli)
  • iconv (GNU libc) 2.15
  • Apache/2.2.22
  • openSUSE 12.2
  • Oracle 10.2.0.4
  • oci
Vadim K.
  • 2,370
  • 18
  • 26
maialithar
  • 3,065
  • 5
  • 27
  • 44
  • You should add the error/warning/notice messages you get to your question, that probably help to say more. If you use iconv at the command-line it should also tell you at which byte-offset the problem occurs. This can be helpful to pin-pint the issue. Also you should provide the subject string in question, both verbatim as well as a hexdump. – hakre Jun 23 '13 at 12:41
  • Thank you for your suggestions. Unfortunatelly, I receive no errors/warnings/notices. The conversion just stopped working. I believe it has sth to do with `iconv` version in the OS, but I have no proof. – maialithar Jun 24 '13 at 06:18
  • Check the return value of iconv. If it is boolean FALSE the conversion failed. You then have provided wrong input data. E.g. the wrong binary sequences for the given input encoding. – hakre Jun 24 '13 at 06:42
  • It is not `FALSE`. As I wrote, I receive `iconv()` result - but not what I expect or what I used to receive. Instead of `ąśę` I get `ase`. I simply don't know why : ( – maialithar Jun 25 '13 at 12:58
  • I guess that is done in the database layer already. With iconv you would need to specify a flag for that and I don't see that in your question. (this is called *transliteration*, just FYI) – hakre Jun 25 '13 at 13:07
  • Is it possible that php's `oci_fetch` or `oci_result` converts strings by itself? I was unable to find it in the docs. My output string looks the same whether I use `iconv()` or not. – maialithar Jun 25 '13 at 13:27
  • Then my nose was right, it happens already earlier. I dunno for OCI maybe already on insert? what is in the database itself when you use a database-viewer? – hakre Jun 25 '13 at 13:31
  • I use `PL/SQL Developer` and it returns proper chars. – maialithar Jun 25 '13 at 13:32
  • then check the charset used for your connection from the php script. – hakre Jun 25 '13 at 13:34
  • Your nose was right, it was somewhere in the middle. I'll post an answer - thanks! – maialithar Jun 25 '13 at 14:16

1 Answers1

0

After some help from hakre, I was able to solve my problem.

My strings were already transliterated when I selected them from the database. PHP's oci_connect() fourth parameter is character set. If you don't provide it, it is taken from environment variable NLS_LANG.

I had neither fourth parameter nor environment variable, so my database connection charset was wrong. Once I added NLS_LANG variable it started to work fine.

Thank you hakre! : )

Community
  • 1
  • 1
maialithar
  • 3,065
  • 5
  • 27
  • 44