I'm still pretty new on Pentaho Soon. I'd like to know if this would be possible to be achieved.
In the past I had many bad experiences with SSIS, so I decided at the time to develop my own ETL using C#. In practice, .Net only does the extract step. Data is inserted to DB and the rest of ETL is done by MSSQL Query Engine from normal SQL in txt files, that .Net reads and executes in MSSQL.
My idea is to move from .Net to Java, and use Spoon features. The advantage is that I'd have Spoon's components avaiable. Table output in example.
One issue I have is that some flat files come currupted. In example, letters with accent are replaced by separator character, so I can't just tell ETL tool to split columns using the separator, I first need to verify how many separators are present and handle it if there are more than expected.
I also need to verify if a file was already processed or not, if it had finished being copied thru network, etc. I also don't want SQL code to be stored on Execute SQL Script components, I want them saved on normal txt files so that Subversion can track changes on them, and ETL tool should read these files and send them to MSSQL to be executed.
So, my idea would be to use Spoon's GUI to build the ETL normally. And then use Eclipse to develop over its SDK to customize the execution. In example, I'd use standard Text File Input component in GUI, but then my jar would have its own class extending the standard one, that customizes the method responsible for receiving a line string and splitting it into fields, and handle any issue.
My jar would instantiate my class instead of Spoon's and provide its object to the engine.
Is it viable, or too complext to bother?