0

This is related to https://stackoverflow.com/questions/1791082/utf-8-php-and-xml-mysql, which I am still trying to get my head around.

I have a couple of separate questions that will hopefully help me understand how to resolve the issues I am having.

I am trying to read values from a database and output into a file in UTF-8 format. But I am having encoding issues, so I thought I would strip back all my code and start with:

$string = "Otivägen";
// then output to a file.

But in vim I can’t even enter the that string; every time I paste it in I get Otivägen.

I tried to create a blank PHP file with only that string and upload it, but when I cat the file again I get Otivägen.

My questions are:

  1. Why is vim displaying it like this?
  2. If the file is downloaded, would it display correctly if an application was expecting UTF-8?
  3. How can I output this string into a file that will eventually be an XML file in UTF-8 encoding?

My understanding of encoding is limited at the moment, and I am trying to understand it.

TRiG
  • 1,181
  • 3
  • 13
  • 30
icelizard
  • 732
  • 3
  • 10
  • 20

2 Answers2

2

1) Why is vim displaying it like this?

This looks like vim is displaying UTF-8-encoded data as ISO 8859-1. Copy&Paste can be problematic (you don't write what system you are on), so I'd advise to type in the text directly.

To properly edit the file in vim, first set vim to use UTF-8:

:set encoding=utf-8

Then type in the text, make sure it's correctly displayed, and save. That will give you a file encoded in UTF-8.

2) If the file is downloaded would it display correctly if an application was expecting UTF-8?

Depends on the encoding. If you save it as above, then yes.

3) How can I output this string into a file that will eventually be an XML file in UTF-8 encoding.

That is apparently very difficult. I'm not that familiar with PHP, but according to Wikipedia:

PHP currently does not have native support for Unicode or multibyte strings; Unicode support will be included in PHP 6[...]

So you'll probably have to google for a workaround. There are also a few UTF-8 helper libraries for PHP & UTF-8. Otherwise it might be better to choose a different language, e.g. Java which has solid Unicode support.

sleske
  • 10,009
  • 4
  • 34
  • 44
  • 1
    PHP can do multibyte strings, you just have to use the right functions for the job; the default string functions aren't multibyte-aware: http://php.net/manual/en/book.mbstring.php – quack quixote Nov 27 '09 at 03:29
1

UTF8 is fun. Once it works. :-/ If anything in the chain is expecting something else and doesn't check, then it all goes pear-shaped.

  • You need to use a terminal program that supports UTF8. Gnome-terminal does. KTerm does. ETerm doesn't.
  • Check your LANG variable in your shell. Mine is en_AU.UTF-8 which means English (Australian) in UTF8.
  • vim should inherit this from the shell. You should be able to verify this with :set encoding

I just tried it and that made everything work.

The key is that everything needs to be in UTF8 mode.

staticsan
  • 1,529
  • 1
  • 11
  • 14