2

I have UTF-8 database with Turkish text data stored in it. When it comes to Turkish there is problem when converting to uppercase or lowercase. Unlike other latin character based languages, turkish has different conversion rule for "i" and "I" characters. The problem is very common among RDBMS products. Most commercial and some open source RDBMS solved this issue. But not Firebird, despite the fact that it is very popular among turkish developers. BTW, it is not issue when database character set is ISO8859-9 (Turkish).

"i" -> uppercase -> "İ"

"ı" -> uppercase -> "I"

As far as I know firebird does not have collation for unicode/turkish .

So work "ikna" uppercased as "IKNA" when it should be "İKNA"

Does any one has workaround solution for such cases? Specifically, I want case incensitive LIKE search on text data.

Pretty informative http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html

tJeyhun
  • 195
  • 1
  • 9
  • You may need to add the full ICU library and [create your own collation](https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-ddl-collation.html). This is - IMHO - not well documented though. You may want to consider to also ask on the firebird-support mailing list. – Mark Rotteveel Dec 08 '18 at 15:08
  • The necessary statement should be something along the lines of `create collation unicode_turkey for UTF8 from unicode 'LOCALE=tr_TR'`, but on Windows at least this fails with a _"Invalid collation attributes"_ error as the Firebird install only has a minimized ICU install (and I don't know how/where to get a full ICU library to get it to actually work). – Mark Rotteveel Dec 08 '18 at 15:19
  • I tried on Ubuntu, and above statement works on Ubuntu, but it then has no effect on `upper`. I'm not sure if it is incomplete, or just ignored by `upper`. – Mark Rotteveel Dec 08 '18 at 15:42
  • In short, there is not proper solution for this problem. Anyways, thank you @MarkRotteveel, as always for fast respond. – tJeyhun Dec 09 '18 at 13:09
  • You could try on firebird-support, maybe someone has an idea why this doesn't work (maybe collation-sensitive upper case isn't supported at all (although that seems to be contradicted by it working for ISO8859-9), maybe my create collation statement isn't correct, etc). Alternatively, consider creating a improvement request on http://tracker.firebirdsql.org/ to add Turkish locale support for UTF8 that correctly handles upper and lower. – Mark Rotteveel Dec 09 '18 at 13:33
  • @CeyhunMamedov by your surname it seems you can read Russian, if not - use http://www.translate.ru // Here is example of making ad hoc charset casting instead of creating collation permanently: https://www.sql.ru/forum/actualutils.aspx?action=gotomsg&tid=1297225&msg=21527259 // Here can be some useful link to ICU page https://www.sql.ru/forum/1300753/nuzhna-pomoshh-s-sortirovkoy?hl=collation such as http://demo.icu-project.org/icu-bin/collation.html and others – Arioch 'The Dec 10 '18 at 09:10

1 Answers1

0

Ceyhun, did you try character set WIN1254 collate pxw_turk ?

create table test
(
    recid integer,
    name_ varchar(50) character set WIN1254 collate pxw_turk
);
commit;

insert into test(name_, recid) values('İsmail Ilgın', 1);

select
    name_,
    upper(name_),
    lower(name_)
from test;

or with use of domain

create domain mydomain as varchar(50) character set WIN1254 collate pxw_turk;
create table test
(
    recid integer,
    name_ mydomain
);
commit;

insert into test(name_, recid) values('İsmail Ilgın', 1);

select
    name_,
    upper(name_),
    lower(name_)
from test;

And result is :

'İsmail Ilgın' 'İSMAİL ILGIN' 'ismail ılgın'