1

I am working with Java and PostgreSQL on Windows . I have some words which include turkish characters like İ,ş,ö,ç etc.

In Java I assign words to a string and try to write it to the database. When I print it on java its encoding appears correct and all characters display correctly. However, while writing it to database the text appears to get mangled/scrambled.

I created my database with this command:

CREATE DATABASE dbname ENCODING "UTF-8"

I tried to fix it by converting Turkish characters into the ISO-8859-1 encoding like (İ -> \u0130 , ş -> \u015F)

//\u0130leti\u015Fim = İletişim
title = \u0130leti\u015Fim
String mytitle = new String(title.getBytes("ISO-8859-1"), "UTF-8");

And then I tried to write mytitle to database but it did not work.

Thanks for your advice.

SOLVED : I realized that it could write turkish characters to database, but the problem was on the response. I added these lines before write to response.

String contentType= "text/html;charset=UTF-8";   
response.setContentType(contentType);
response.setCharacterEncoding("utf-8");

After adding this, it works now. I hope, i could explain cleanly.

sdirlik
  • 51
  • 3
  • 9

3 Answers3

3

When you call title.getBytes("ISO-8859-1"), you're promising the Java runtime that the characters in the string can be represented as ISO-8859-1 bytes, which is not actually true for either \u0130 or \u015f. Therefore already the conversion to bytes will do something unspecified with your Turkish characters -- probably they will just be dropped.

Next, attempting to interpret whichever bytes you get out of it as UTF-8 even though they're really ISO-8859-1 is then guaranteed to make a complete mess of everything that wasn't ASCII to begin with.

(The repretoire of ISO-8859-1 happens to coincide exactly with the Unicode characters that can be written as \u00XX for some XX).

hmakholm left over Monica
  • 23,074
  • 3
  • 51
  • 73
  • I changed my code to: title = "İletişim"; mytitle = new String(title.getBytes(),"UTF-8"); And i also tried it on ubuntu, but it still cannot write turkish characters to database correct. – sdirlik Sep 19 '12 at 07:54
  • 1
    But what on earth are you trying do achieve by converting the string to bytes and back to a string again? _At best_ you will get the original string back, but possibly some of the Turkish characters that are _already_ in the string will get destroyed by that pointless exercise. There's no possible way string->bytes->string can do anything useful for you. – hmakholm left over Monica Sep 19 '12 at 10:00
2

With encoding issues you have several things to check:

  • Whether your source file is in the encoding you expect it to be.
  • How client_encoding is set
  • What the database encoding is

In the case of Java, PgJDBC requires client_encoding to always be UTF-8 and will choke if you set it to something else, so that's not going to be the issue. You've shown that your database is UTF-8 too. So it seems likely that your Java sources aren't in the same encoding the Java compiler and runtime expect them to be in.

By default javac will interpret your source code in the platform default encoding. If you've saved your sources in a different encoding, weird things will happen. Save your sources either:

  • in the default encoding for your Windows platform;
  • as Unicode ("UTF-16" or "UCS-2"); or
  • As UTF-8 with a Byte Order Mark (BOM). Many programs don't add a BOM for UTF-8.

Then recompile your program. If that doesn't help, you'll need to follow up with more detail, starting with what exactly "it did not work" means, output of SELECTing the data you inserted with Java using psql, etc.

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • When i save it in UTF-16 or UCS-2 it gives errors. I also tried to save it with UTF-8 with a BOM, but there is no change, it still cannot write to database Turkish characters. Moreover, i also tried it on Ubuntu, but it did not change. – sdirlik Sep 19 '12 at 07:53
  • @sinan "It gives errors". What errors, exactly? Update your question. – Craig Ringer Sep 19 '12 at 08:55
0

You should create the database like this:

CREATE DATABASE <db name> 
        WITH OWNER <owner user name>
    TEMPLATE template0 
    ENCODING 'UTF-8' 
    LC_COLLATE 'tr_TR.UTF-8' 
    LC_CTYPE = 'tr_TR.UTF-8';