Now, after long and and unhurried conversation with IBM support, I can see how far I was from solution of my initial idea. The aim was to develop UDF that will take text in binary from char for bit data column in the table and return the text in unicode. The problem is that the table I am working with contains EBCDIC texts accumulated from different counties which use different CCSIDs. Different rows can contain text encoded with different CCSID. That's why I need second integer parameter for CCSID.
The correct definition for the function would be with
CREATE FUNCTION US15030.BIN2TEXT (VARCHAR(2023) FOR BIT DATA CCSID UNICODE, INTEGER)
RETURNS VARGRAPHIC(2023) CCSID UNICODE
in the first two lines.
That sounds a bit strange. Why should I assign CCSID UNICODE for parameter that contains a binary data which definitely are not in UNICODE? I do not expect any conversion on input. But that's how DB2 is designed. It requires all character parameters be declared in the same CCSID. Otherwise it refuses to accept declaration. As the matter of fact it does not do any harm. No conversion is done for CHAR BIT DATA.
I had to change output parameter from VARCHAR to VARGRAPHIC by advice from IBM support.
Another error is contained in my question itself. I expected function to return text in UTF-8. That is bad idea. DB2 interface with C can work only with UTF-16 unicode. DB2 takes data from UDF in UTF-16 and if you insert result into text column of unicode table it is converted to UTF-8 automatically.
Having no experience in external UDF development in C, I decided to start from some working sample provided by IBM. I found one but only for EBCDIC data. My own one based on the sample worked fine with returned EBCDIC VARCHAR.
First lines of DDL definition looked as:
CREATE FUNCTION US15030.BIN2TEXT (VARCHAR(1024) FOR BIT DATA, INTEGER )
RETURNS VARCHAR(1024) FOR MIXED DATA CCSID EBCDIC
And parameters in C code looked as:
void SIWBI2TX /* main routine */
( char *inString, /* in: string to convert */
int *inCCSID, /* in: CCSID of string to conv*/
char *outString, /* out: converted string */
short int *ninString, /* in: indic for string to co */
short int *nCCSID , /* in: indic for inCCSID */
short int *noutString, /* out: indic for outString */
char *sqlstate, /* out: SQLSTATE */
char *fnName, /* in: family name of function*/
char *specificName, /* in: specific name of func */
char *message /* out: diagnostic message */
);
To check if I can transfer parameters in and out correctly, I just returned to output the string that I provided on input in hex form. The simplest string for me was € which is 20AC in utf-16 or E282AC in UTF-8.
My test code received and returned it without distortion in both forms, but DB2 did not show it as text while hex on output it looked same as on input.
When I changed DDL as in item 1 above, my C code stopped to work.
The solution was to change input and output parameter types as:
typedef unsigned short sqldbchar;
struct VARGRAPH {
short len;
sqldbchar data[2023];
};
struct VARCHAR {
short len;
char data[2023];
};
void SIWBI2TX
( struct VARCHAR *inString, /* in: string to convert */
int *inCCSID,
struct VARGRAPH *outString,
short int *ninString, /* in: indic for string to co */
short int *nCCSID,
short int *noutString,
char *sqlstate, /* out: SQLSTATE */
char *fnName, /* in: family name of function*/
char *specificName, /* in: specific name of func */
char *message /* out: diagnostic message */
)
With this definition output string is placed in outString->data and should be encoded as UTF-16.
- Another problem that I met after change in item 3 was that on the case of error, diagnostic message and SQLCODE stopped to work. The clue was that with UTF-16 in returned value, the message and SQLCODE have to be in UTF-16 too. But specific UTF-16. As far as definitions for them is not changed and they are still defined as C char *, which means only one byte for character, they expected to contain only one lower byte of UTF-16 code. This kind of encoding is possible only for English text and digits. But that is enough for error messages and codes.
I hope my long post helps somebody. The problem is that this information is absent in IBM documentation related to user defined functions and procedures.