0

I am using Delphi 2009 with IBX on Firebird 3 database (I have no choice to choose other technologies, I have to adapt to the situation). I have the following defintions:

Firebird BLOB field is defined as:

BLOB SUB_TYPE 0 SEGMENT SIZE 80

TWideMemoField is defined as:

object MainQryNOTES: TWideMemoField
  FieldName = 'NOTES'
  Origin = 'INVOICES.NOTES'
  ProviderFlags = [pfInUpdate]
  BlobType = ftWideMemo
end

The test string is "Цель по инфляции, %" and in it can be read from the BLOB field in the IBExpert software as:

26 04 35 04 3B 04 4C 04 20 00 3F 04 3E 04 20 00
38 04 3D 04 44 04 3B 04 4F 04 46 04 38 04 38 04
2C 00 20 00 25 00

The strange thing is that the Delphi inverts byte order, e.g. cyrillic character Ц has HEX UTF8 representation as 04 26, but it is stored in database as 26 04 and the similar situation is exactly with the other characters as well (one can check this with the help of tables https://www.w3schools.com/charsets/ref_utf_basic_latin.asp and https://www.w3schools.com/charsets/ref_utf_cyrillic.asp). In my case I have only 2-byte charactes, but I guess that the similar situation will be with 3 and 4 byte UTF8 characters as well.

So - how can I configure TWideMemoField to ask not to convert byte order of UTF8 strings?

TomR
  • 2,696
  • 6
  • 34
  • 87
  • 1
    Why do you ask (practically) the same question again, instead of editing the original? Ref: https://stackoverflow.com/q/52138881/2292722 – Tom Brunberg Sep 03 '18 at 07:11
  • In this question I tried to understand how to configure the save from TWideMemoField to database in UTF8 format and in that question I will try to understand how can I convert UTF16LE to UTF8 in PHP. – TomR Sep 03 '18 at 07:18
  • try sql.ru forum, maybe there were people who met it. Also, maybe you can find a newer IBX than d2009 out of the box – Arioch 'The Sep 03 '18 at 09:22
  • 2
    This your BLOB field is binary not textual, so I wonder if Firebird would even try to convert it. If anything, you did specify neither the blob field charset/collation nor the connection charset you set up. On BLOB subtypes: https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-datatypes-bnrytypes.html – Arioch 'The Sep 03 '18 at 09:24

1 Answers1

3

Your text is not encoded as UTF8, it is encoded as UTF16. The character Ц is U+0426. And by convention the 16 bit code unit is being stored in little endian byte order, $26 $04.

In other words, everything is behaving as expected and as designed and I can see no need for you to try to fix anything because nothing is broken.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • A reason to fix this is that Firebird itself has no UTF-16 support, which means that you need to handle it as binary data instead of being able to handle the content in the database as text (that is: you can't properly use some of the built-in database functions and operators for string manipulation). – Mark Rotteveel Sep 03 '18 at 13:57
  • @MarkRotteveel So you are saying the issue is that UTF-16 is used at all, and not the byte order? The way forward would be to make sure that UTF-8 was used? – David Heffernan Sep 03 '18 at 14:04
  • Only the OP can be sure of that, but I'd say yes, the fact UTF-16 is persisted to the database at all is a potential problem, and the OP may want to fix that and ensure UTF-8 is stored instead. – Mark Rotteveel Sep 03 '18 at 14:06
  • @MarkRotteveel that would require him to change the field type on the fly, which might be hard if dependencies – Arioch 'The Sep 04 '18 at 07:19