6

I'm working on a I18N application which will be located in Japanese, I don't know any word in Japanese, and I'm first wondering if utf8 is enough for that language.

Usually, for European language, utf8 is enough, and I've to set up my database charset/collation to use utf8_general_ci (in MySQL) and my html views in utf8, and it's enough.

But what about Japanese, is there something else to do?

By the way my application would be able to handle English, French, Japanese, but later on, it may be needed to add some languages, let's say, Russian.

How could I set up my I18N application to be available widely without having to change much configurations on deployment?

Is there any best practices?

By the way, I'm planning to use gettext, I'm pretty sure it supports such languages without any problems as it is the de facto standard for almost all GNU softwares, but any feedback?

Boris Guéry
  • 47,316
  • 8
  • 52
  • 87

4 Answers4

5

A couple of points:

  • UTF-8 is fine for your app-internal data, but if you need to process user-supplied documents (e.g. uploads), those may use other encodings like Shift-JIS or ISO-2022-JP
  • Japanese text does not use whitespace between words. If your app needs to split text into words somewhere, you've got a problem.
  • Apart from text, date and number formats differ
  • The generic collation may not lead to a useful sort order for Japanese text - if your app involves large lists that people have to find things in, this can be a problem.
Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
  • Thank you, that's a great checklist. Anything about text direction? I think japanese can be written in any sense, but may there be a standard on web to avoid problem with rtl language? – Boris Guéry Jun 08 '11 at 10:49
  • 1
    @Boris: Japanese is not rtl. It is traditionally written in vertical lines, but this is pretty much never done in electronic media. Rtl is a concern with Arabic and Hebrew, though, and you can get strange effects when mixing those with ltr text. But I don't know much more about that issue than that it exists - if you want to seriously support rtl languages, you'll have to research it. – Michael Borgwardt Jun 08 '11 at 11:56
  • I just checked and you're right, I was pretty sure to have read somewhere that Japanese could be RTL. However direction is now pretty evil pre optimization, but I got curious about it anyway. Thanks – Boris Guéry Jun 08 '11 at 18:29
  • 2
    @Boris: well, if you write Japanese or Chinese vertically, the lines go rtl. – Michael Borgwardt Jun 09 '11 at 07:37
3

Yep, Unicode contains all the code points you need to display English, French, Japanese, Russian, and pretty much any language in the world (including Taiwanese, Cherokee, Esperanto, really anything but Elfish). That's what it's for. Due to the nature of UTF8, though, text in more esoteric languages will take a few bytes more to store.

Gettext is widely used and your PHP build probably even includes it. See http://php.net/gettext for usage details.

Wander Nauta
  • 18,832
  • 1
  • 45
  • 62
1

Just to add that interesting website to help build I18N application: http://www.i18nguy.com/

Boris Guéry
  • 47,316
  • 8
  • 52
  • 87
0

If you store text in text files then it goes like this:

This is the main folder structure for language:

-lang
      -en
      -fr
      -jp
      etc

every subfolder, en, fr... contains the same files, the same variables with different values.

For example in lang/en/links.txt You would have

class txtLinks 
{

public static $menu="Menu";
public static $products="Show products";
....

class txtErrors 
{

public static $wrongUName="This user  does not exists";
....

Then when a script loads you do

if(en)
define(__LANG,'en')
if(fr)
define(__LANG,'fr')
...

Then

include('lang'.__LANG.'what ever file you want')

Then this is a piece from your php script:

echo txtLink::$menu etc...

If you go the database way you do sth analogous, where instead of files you have tables.

This way you have absolute freedom cause you can give the english files to some person who speaks let's say french and he is able to fill the values in french without being required to know programming at all.

And you your self don't care what language is later added or removed.

And if you work on mvc you can split language files in accordance to controllers so you don't result in loading a huge text file.

Melsi
  • 1,462
  • 1
  • 15
  • 21