DataStage DB2 Runtime Column Propagation with Unicode Data

Question

I'm trying to read some data from DB2 9.7 using DataStage 9.1 using the Runtime Column Propagation (RCP) feature. The general approach is to set the DB2 Connector Stage to generate the query SQL by specifying only the connection details and table name. This approach works in most cases but I'm running into a problem with multibyte characters.

When I have this setup and the data being read is from one of the CJK (Chinese, Japanese, or Korean) languages, I see ^Z characters in the sequential file that I write to. When I write this data to a Data Set I can see that the schema incorrectly specifies that the column's data type is string instead of ustring. From reading the default type conversion documentation it appears that DataStage is reading the data from DB2 as a ustring but is then trying to put it into the generated dataset as a string.

When I generate a Table Definition for the table I can see that it's generated the columns with the string type. However, when I load the table definition into a DB2 connector I see a checkbox, which is checked by default, that will automatically add the Unicode extended attribute to character columns.

With all of the above information it seems like DataStage is capable of generating ustring columns instead of string, but it doesn't do it for runtime column propagation.

In short: is there any way to convince DataStage to generate ustring columns when using RCP? Is there a setting on the DB2 Connector stage? Is there a setting at the project level? Is there an environment variable that controls this? If not, do I then need to build table definitions for every table and write a custom job to extract that data solely because of the problem I'm seeing?

What have you tried? What is the design of your job? Is it a DB2 -> DS without any intermediate stages, using RCP? — Octopus, Mar 27 '13 at 20:02
Our original job was fairly involved but we've narrowed it down to a DB2 stage writing to a DataSet, Sequential file, or DB2 table. When RCP is turned on multibyte characters are mangled. When we use a Table Definition it works fine. It seems like there's a problem with the DB2 stage doing the extract not preserving Unicode. — Bryan Kyle, Apr 05 '13 at 16:27

score 1 · Accepted Answer · answered Aug 05 '13 at 16:21

This isn't an issue with DataStage, it's a DB2 configuration problem.

All machines that work with the DB2 data need to have their DB2CODEPAGE variable set to the same value. If the DB2 server has it set to 1208 but the DB2 client on the DataStage machine has it set to another value, then results that get written will be malformed.

Ensure that DB2CODEPAGE is set to the same value across all of the machines that have the DB2 client or server installed.

DataStage DB2 Runtime Column Propagation with Unicode Data

1 Answers1