I'm trying to read some data from DB2 9.7 using DataStage 9.1 using the Runtime Column Propagation (RCP) feature. The general approach is to set the DB2 Connector Stage to generate the query SQL by specifying only the connection details and table name. This approach works in most cases but I'm running into a problem with multibyte characters.
When I have this setup and the data being read is from one of the CJK (Chinese, Japanese, or Korean) languages, I see ^Z
characters in the sequential file that I write to. When I write this data to a Data Set I can see that the schema incorrectly specifies that the column's data type is string
instead of ustring
. From reading the default type conversion documentation it appears that DataStage is reading the data from DB2 as a ustring
but is then trying to put it into the generated dataset as a string
.
When I generate a Table Definition for the table I can see that it's generated the columns with the string
type. However, when I load the table definition into a DB2 connector I see a checkbox, which is checked by default, that will automatically add the Unicode
extended attribute to character columns.
With all of the above information it seems like DataStage is capable of generating ustring
columns instead of string
, but it doesn't do it for runtime column propagation.
In short: is there any way to convince DataStage to generate ustring columns when using RCP? Is there a setting on the DB2 Connector stage? Is there a setting at the project level? Is there an environment variable that controls this? If not, do I then need to build table definitions for every table and write a custom job to extract that data solely because of the problem I'm seeing?