I'm new with PDI and still learn about it. I'm trying to create transformation that will read all the csv file from one folder, check if the data of the file is correct, meaning there is no rows with missing/error/wrong format, then store it in a database.
What I have try is :
- Use
Text File Input
accessing CSV file in FTP using Apache Common VFS. - Validate and make condition to check the data (checking filename, field if exist) in CSV using
Filter Row
- Output into PostgreSQL Table using
Syncronize After Merge
. I used this because I also join CSV data with data from another table.
The result from my second step is not what I want. Currently it checks after all csv are read and pass all the data to next step but what I want is to check while read the data so it will pass only correct data to next step. How can I do that? any suggestion? (need brainstorming)
And if that impossible to implement in PDI then it's okay to read all data and pass it to the next step but then will validate again before insert the data.