I need read these data and then load them into oracle tables using (if possible) a single spoon transformation.
The data are stored in pairs of two files. Each pair has one file (data-type_yyyymmdd.txt) with each row placed in each line ended with carriage-return
and no delimiters between the fields. The key to extracting theses fields is the second file (data-type_yyyymmdd.ref) that contains information to map these fields along the rows. Let's take a look at one example:
data-type_yyyymmdd.ref file layout:
OPERATION_ID A 011
USER A 008
DATE_TRANSACTION A 0008
TIME_TRANSACTION N 0004
The first column is field name, second is the data type (A=alphanumeric, N=numeric), third is the field length. The columns have fixed position along the line.
IMPORTANT: The pairs could have REF file with different structures, but pairs with same kind of REF files (identified by data-type
in file name) has the same structure.
data-type_yyyymmdd.txt file layout:
A line of this file
50593749120ABCDEFGH201701021444
The fields for this line have these values:
OPERATION_ID=50593749120
USER=ABCDEFGH
DATE_TRANSACTION=20170102
TIME_TRANSACTION=1444
So the pentaho transformation has to read these files, retrieve the rows/fields and insert them into a table.
How to identify the table to insert these data?
The key is data-type
part of file name. We have a set of different tables and the name of them is identified by this part. Each table structure is equivalent to his related REF file. Example:
This file pair userlogins_20170701.ref
and userlogins_20170701.txt
has data of userlogins
oracle table. Let's suppose this pair has the same structure of the example above. So the table has this structure:
ID - numeric(16)
OPERATION_ID - varchar2(11)
USER - varchar2(8)
DATE_TRANSACTION - varchar2(8)
TIME_TRANSACTION - numeric(4)
All tables has the ID field as primary key filled by a sequence database. The other fields follow the REF file structure. fields with A type are mapped to varchars field and N to numeric fields.