DB2 z/OS: Problem to create UDF to return UNICODE string

Question

I need to create user defined function to return utf-8 UNICODE string. When I am trying to create it I get the error message:

THE ENCODING SCHEME SPECIFIED FOR THE FUNCTION MUST BE THE SAME AS THE CONTAINING OR TARGET TABLE SPACE OR OTHER PARAMETERS. SQLCODE=-874, SQLSTATE=53091, DRIVER=4.19.49

With EBCDIC it works fine.

Below is DDL I use:

CREATE FUNCTION US15030.BIN2TEXT (VARCHAR(1024) FOR BIT DATA, INTEGER )
    RETURNS VARCHAR(1024) FOR MIXED DATA CCSID UNICODE
    SPECIFIC US15030.SIWBI2TXV
    EXTERNAL NAME SIWBI2TX
    LANGUAGE C
    PARAMETER STYLE DB2SQL
    DETERMINISTIC
    FENCED
    CALLED ON NULL INPUT
    NO SQL
    NO EXTERNAL ACTION
    ALLOW PARALLEL
    WLM ENVIRONMENT DB2EAPP1
    ASUTIME LIMIT 10
    NOT SECURED;

score 0 · Answer 1 · answered Sep 02 '21 at 21:19

Now, after long and and unhurried conversation with IBM support, I can see how far I was from solution of my initial idea. The aim was to develop UDF that will take text in binary from char for bit data column in the table and return the text in unicode. The problem is that the table I am working with contains EBCDIC texts accumulated from different counties which use different CCSIDs. Different rows can contain text encoded with different CCSID. That's why I need second integer parameter for CCSID.

The correct definition for the function would be with

CREATE FUNCTION US15030.BIN2TEXT (VARCHAR(2023) FOR BIT DATA CCSID UNICODE, INTEGER) RETURNS VARGRAPHIC(2023) CCSID UNICODE

in the first two lines. That sounds a bit strange. Why should I assign CCSID UNICODE for parameter that contains a binary data which definitely are not in UNICODE? I do not expect any conversion on input. But that's how DB2 is designed. It requires all character parameters be declared in the same CCSID. Otherwise it refuses to accept declaration. As the matter of fact it does not do any harm. No conversion is done for CHAR BIT DATA.

I had to change output parameter from VARCHAR to VARGRAPHIC by advice from IBM support.

Another error is contained in my question itself. I expected function to return text in UTF-8. That is bad idea. DB2 interface with C can work only with UTF-16 unicode. DB2 takes data from UDF in UTF-16 and if you insert result into text column of unicode table it is converted to UTF-8 automatically.
Having no experience in external UDF development in C, I decided to start from some working sample provided by IBM. I found one but only for EBCDIC data. My own one based on the sample worked fine with returned EBCDIC VARCHAR.

First lines of DDL definition looked as:

CREATE FUNCTION US15030.BIN2TEXT (VARCHAR(1024) FOR BIT DATA, INTEGER )
RETURNS VARCHAR(1024) FOR MIXED DATA CCSID EBCDIC

And parameters in C code looked as:

void SIWBI2TX                         /* main routine               */
 ( char        *inString,              /* in: string to convert      */
   int         *inCCSID,               /* in: CCSID of string to conv*/
   char        *outString,             /* out: converted string      */
   short int   *ninString,             /* in: indic for string to co */
   short int   *nCCSID   ,             /* in: indic for inCCSID      */
   short int   *noutString,            /* out: indic for outString   */

   char        *sqlstate,              /* out: SQLSTATE              */
   char        *fnName,                /* in: family name of function*/
   char        *specificName,          /* in: specific name of func  */
   char        *message                /* out: diagnostic message    */
 );

To check if I can transfer parameters in and out correctly, I just returned to output the string that I provided on input in hex form. The simplest string for me was € which is 20AC in utf-16 or E282AC in UTF-8. My test code received and returned it without distortion in both forms, but DB2 did not show it as text while hex on output it looked same as on input.

When I changed DDL as in item 1 above, my C code stopped to work. The solution was to change input and output parameter types as:

typedef unsigned short sqldbchar;

struct VARGRAPH {
short len;
sqldbchar data[2023];
};

struct VARCHAR {
short len;
char data[2023];
};

void SIWBI2TX
(  struct VARCHAR *inString,           /* in: string to convert    */
   int *inCCSID,
   struct VARGRAPH *outString,
   short int *ninString,           /* in: indic for string to co */
   short int *nCCSID,
   short int *noutString,

   char        *sqlstate,              /* out: SQLSTATE              */
   char        *fnName,                /* in: family name of function*/
   char        *specificName,          /* in: specific name of func  */
   char        *message                /* out: diagnostic message    */  

)

With this definition output string is placed in outString->data and should be encoded as UTF-16.

Another problem that I met after change in item 3 was that on the case of error, diagnostic message and SQLCODE stopped to work. The clue was that with UTF-16 in returned value, the message and SQLCODE have to be in UTF-16 too. But specific UTF-16. As far as definitions for them is not changed and they are still defined as C char *, which means only one byte for character, they expected to contain only one lower byte of UTF-16 code. This kind of encoding is possible only for English text and digits. But that is enough for error messages and codes.

I hope my long post helps somebody. The problem is that this information is absent in IBM documentation related to user defined functions and procedures.

DB2 z/OS: Problem to create UDF to return UNICODE string

1 Answers1