0

I need to make some translations from/into the French/Dutch/German languages using Delphi 2006 (without any third party units/components).

These 3 languages have the code page 1252. Our database is UTF-8 compliant, so at this moment I rely on the fact that all the values from the tables are UTF-8. Should I be confident on this assuming? This will work well, or I should worry about UTF-8 -> code page 1252 differences, if there are any? I didn't understand the difference between UTF-8 and code pages(for example I understood that the first 127 bytes are the same, and begging with the 128th byte are different).

Second, I need to make a search on some fields. Can I rely on ANSIUpperCase function from D2006? Or should I do a custom function, to treat each special character?

LE: data is stored in UTF-8 format.

Thanks in advance!

Warren P
  • 65,725
  • 40
  • 181
  • 316
RBA
  • 12,337
  • 16
  • 79
  • 126
  • I would like your question better if you asked about one subject per question. It seems useless to the whole world, except you, because you have combined so much unrelated stuff into one question. Everyone benefits if you break your questions up. Also, since your DB is already UTF8, I wonder why you would keep using an ANSI delphi version? XE/XE2 are a natural upgrade. Just a thought. – Warren P Dec 01 '11 at 01:36

1 Answers1

5
  1. The database being UTF8-compliant doesn't mean the data is actually stored in UTF8. E.g. in Firebird (which is UTF8-compliant) you can declare tables using ANSI character sets.
  2. You'll need to convert from UTF8 to ANSI 1252 and vice versa. E.g. with UTF8Encode and UTF8Decode routines.
Ondrej Kelle
  • 36,941
  • 2
  • 65
  • 128
  • I will clarify the question. Data is stored in UTF-8 – RBA Nov 30 '11 at 10:01
  • and after I decode the UTF-8, AnsiUpperCase will give me the good result? or I should treat the string? – RBA Nov 30 '11 at 10:41
  • 1
    Yes, after converting to ANSI, you can use ANSI string functions normally. Don't forget to convert results back to UTF8 to post to the database. – Ondrej Kelle Nov 30 '11 at 10:55
  • Note that `UTF8Encode()` expects a `WideString` as input, and `UTF8Decode()` returns a `WideString` as output. So really, you have to go from `UTF8->UTF16->ANSI` when reading the data, and from `ANSI->UTF16->UTF8` when writing the data. You have to be VERY careful with the `UTF16->ANSI` and `ANSI->UTF16` conversions in particular. If you must use codepage 1252 specifically, then use it explicitally in code by calling `MultiByteToWideChar()` and `WideCharToMultiByte()` directly. Do not rely on the RTL always using CP1252 on all machines. – Remy Lebeau Dec 02 '11 at 01:59