I am working in a data warehouse project with a lot of sources creating flat files as sources and we are using SSIS to load these into our staging tables, we are currently using the Flat File Source component.
However, after a while, we need an extra column in one of the files and from a date the file specification change to add that extra column. This exercise happens quite frequently and over time accumulate quite a lot versions.
According to answers I can find here and on the rest of the internet the agreed method to handle this scenario seems to be to set up a new flat file source in a new separate data flow for this version, to keep re-runablility for ETL process for old files. Method is outlined here for example: SSIS pkg with flat-file connection with fewer columns will fail
In our specific setup, the additional columns are always additional columns (never remove old columns) and also, for logical reasons the new columns can not be mandantory if we keep re-runability for the older files in their separate data flows.
I don´t think the method of creating a duplicate data flow handling the same set of columns over and over again is a good answer for a data warehouse project as ours and I would prefeer a source component that takes the last file version and have the ability to mark columns as "not mandadory" and deliver nulls if they are missing.
Is anybody aware of a SSIS Flat File component that is more flexible in handle old file versions or have a better solution for this problem? I assume that such a component would need to approach the files on a named column basis rather than the existing left-to-right approach?
Any thoughts or suggestions are welcome!