Using PDI (Kettle) I am filling the entry-stage of my database by utilizing a CSV Input
and Table Output
step. This works great, however, I also want to make sure that the data that was just inserted fulfills certain criteria, e.g. fields not being NULL, etc.
Normally this would be a job for database constraints, however, we want to keep the data in the database even if its faulty (for debugging purposes. It is a pain trying to debug a .csv file...). As it is just a staging table anyway it doesn't cause any troubles for integrity, etc.
So to do just that, I wrote some SELECT Count(*) as test123 ...
statements that instantly show if something is wrong or not and are easy to handle (if the value of test123 is 0 all is good, else the job needs to be aborted).
I am executing these statements using a Execute SQL Statements
step within a PDI transformation. I expected the result to be automatically passed to my datastream, so I also used a Copy rows to result
step to pass it up to the executing job.
This is the point where the problem is most likely located.
I think that the result of the SELECT statement was not automatically passed to my datastream, because when I do a Simple evaluation
in the main job using the variable ${test123}
(which I thought would be implicitly created by executing SELECT Count(*) as test123 ...
) I never get the expected result.
I couldn't really find any clues to this problem in the PDI documentation so I hope that someone here has some experience with PDI and might be able to help. If something is still unclear, just hint at it and I will edit the post with more information.
best regards
Edit: This is a simple model of my main job:
Start --> Load data (Transformation) --> Check data (Transformation) --> Simple Evaluation --> ...