Problem
I use dataserver to create records in MSSQL. I transfer data in bulk, with for each
. Lets say tens of records. As I transfer those records with buffer-copy
, I can easily verify the transfer with buffer-compare
.
For unknown reasons, some records are malformed. When this happens, all consecutive records get malformed as well. This malformation means that character
fields transfer with chinese characters in them. The first record that does this is only partially malformed. Meaning that some fields are ok, one is partially ok with the rest in chinese, and the rest of fields are completely in chinese. Character fields of all other records get malformed completely.
I am not able to artificially produce this, there is no known (to me) character or character sequence that does this. For a while we believed that it was chr(10)
and replacing it with an empty string solved the problem. But it is back again.
Code I use to do it
Lets say I have a source oe_table
in OpenEdge database, ms_table
in MSSQL which I access through DataServer. Following code example is slightly modified excerpt from the actual program:
or here: https://pastebin.com/MFF4FEQ3
Note: The actual for each
has where
clausule and the buffer-copy
has except
option for two fields (of course this is reflected in buffer-compare
also).
Note: We came up with the idea to raw-transfer
the record, because we could not find anything strange in the fields specified by cUnequalFields
.
What we discovered
Contents of oe_table.bin
:
Contents of
ms_table.bin
:
Contents of
tt_table.bin
:
Contents of
ms_table_ok.bin
, when the record does not get malformed (as to how did I get that record to not get malformed, I will get to that later):
The obvious thing here is this: oe_table
and tt_table
has a codepage specified, while ms_table
does not. It is even more obvious when comparing the the record in ms_table
that gets malformed with the exact same record in ms_table_ok
that does not get malformed. While the ms_table_ok
has UTF-8
specified, the ms_table
does not have any.
Another interesting thing - look at the 0x250
row in ms_table
and compare it to the 0x260
row in ms_table_ok
. Look at the data "system". For some reason, ms_table
treats it as single-byte, while the ms_table_ok
treats it as double-byte. Also, ms_table
does this even though it is not the first character field in the table. So he treats the preceeding field as double-byte, then treats this one as single-byte, then continues to treat remaining fields as double-byte again.
Next thing I just realized is that "utf-8", which is in ms_table
missing, takes five bytes. That is odd number. That may explain different interpretation of strings later on. Look at row 0x320
and 0x330
in ms_table
and ms_table_ok
respectively. While the ms_table_ok
represents single-byte characters in double-byte interpretation as d.
, the ms_table
represents it the other way as .d
. That is one byte shift.
How did I get the fine version of the record
This is also very strange. I experimented with startup parameters on the DataServer. Adding -Dsrv qt_no_debug
or any other -Dsrv qt_debug
means I can transfer the record without a problem, while not having that parameter means it will get malformed. That is the strangest thing in all this by far for me. So with this, I managed to get the ms_table_ok
for comparison.
Then we have the same problem on another dataserver, where this parameter thing does not work at all.
Sorry, what was the question again?
The question is: What is the reason for the codepage not to be specified in the transfer?
Because that looks like the real issue here. But maybe I am not right and am missing something? Or do you have any idea what else can I try? I was convinced that the dataserver settings are not right, but then it works with that parameter and it works for other tables.
I will appreciate ANY advice! Thanks!