1

I am joining two files. One file is a extraction from table(in0 port) having record format like this utf8 string("\x01", maximum_length=3).
And another file is a normal text file(in1 port) having record format like this ascii string(3).

While joining i am getting below error:

Field "company" in key specifier for input in1 has type "ascii string(3)",
but field "kg3_company_cd" in key specifier for input in0 has type "utf8 string("\x01", maximum_length=3)".
This join may be attempted in spite of the type mismatch by
setting configuration variable AB_ALLOW_DANGEROUS_KEY_CASTING to true.
However, typically the input streams will have been hash-partitioned on
the join keys of different types, making it unlikely that all equal join.
Robert
  • 1,286
  • 1
  • 17
  • 37
Nitish
  • 11
  • 5

1 Answers1

0

The issue is that a utf8 string and an ascii string are different underlying data to represent the same value. The error message you're receiving is warning you that if your join is running in parallel, it's likely that the hash partitioning algorithm would have sent matching key values from each flow to different partitions because the underlying data representing the "equal" strings is different. Ex: If both flows have 3 records each where the keys field values are ("A", "AB", ABC"), key "AB" may be on partition 0 for one flow, but partition 7 for the other flow. Your join component will run one instance for each partition, expecting the data to be partitioned correctly. The instance for partition 0 will see key "AB" on one flow but not the other. If it's an inner join, you'll see only those matching key records that were coincidentally sent to the same partition on the output.

You should pick which string encoding you want and ensure both flows have matching encoding before the join. Just add a reformat prior.

Matt
  • 3
  • 3