I need to use Kettle/PDI community version to read big fixed length data files and do some ETL stuff on them. During development stage I faced following issue:
Kettle plugin "Fixed File Input" allows multiple data types with remark they are actually Strings or byte arrays.
My input contained both: Strings and byte arrays corresponding to Little Endian representation of long, int and short (Intel specific endian-ness). Example of record structure to be read: Column1(char:8), Column2(long:8 hex), Column3(char:2),Column4(int:4 hex).
I tried to use "Select Values" plugin and change Binary type of column to Integer but such method is not implemented. Finaly I ended with following solution:
- I used "User Defined Java Class" with code pasted below.
As you can see I used a formula to obtain long value.
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
// It is always safest to call createOutputRow() to ensure that your output row's Object[] is large
// enough to handle any new fields you are creating in this step.
r = createOutputRow(r, data.outputRowMeta.size());
// Get the value from an input field
byte[] buf;
long longValue;
// BAN_L - 8 bytes
buf= get(Fields.In, "BAN").getBinary(r);
longValue= ((buf[0] & 0xFFL) << 0) | ((buf[1] & 0xFFL) << 8)
| ((buf[2] & 0xFFL) << 16) | ((buf[3] & 0xFFL) << 24)
| ((buf[4] & 0xFFL) << 32) | ((buf[5] & 0xFFL) << 40)
| ((buf[6] & 0xFFL) << 48) | ((buf[7] & 0xFFL) << 56);
get(Fields.Out, "BAN_L").setValue(r, longValue);
//DEPOSIT_PAID_AMT -4 bytes
buf = get(Fields.In, "DEPOSIT_PAID_AMT").getBinary(r);
longValue= ((buf[0] & 0xFFL) << 0) | ((buf[1] & 0xFFL) << 8)
| ((buf[2] & 0xFFL) << 16) | ((buf[3] & 0xFFL) << 24);
get(Fields.Out, "DEPOSIT_PAID_AMT_L").setValue(r, longValue);
//BILL_SEQ_NO_L -2 bytes
buf = get(Fields.In, "BILL_SEQ_NO").getBinary(r);
longValue = ((buf[0] & 0xFFL) << 0) | ((buf[1] & 0xFFL) << 8);
get(Fields.Out, "BILL_SEQ_NO_L").setValue(r, longValue);
// Send the row on to the next step.
putRow(data.outputRowMeta, r);
//binaryToDecimal();
return true;
}
Problem arise when I have in one data extracts 8-20 binary fields. Is there any alternative to this approach so I can call something like:
getNumberFromLE(byte [] buff, buff.length);
Is there any other plugin in development which can be used to transform byte[] to Pentaho Kettle "Number" data type? (BigNumber and Integer are also good).