I have an SSIS task that reads JSON files from Azure Data Lake, parses them with deserialize command in a Script Task and creates copies of them as tables in a Local SQL Server.
This is going on well, but very slow. It takes me 6 hours to import one directory and there are many.
So I decided to make this in an incremental fashion.
At first, I used Conditional Split to avoid writing the old records into the local SQL server, but this still involved the time-cost of reading the whole file. And Conditional Split is a funny component, I have observed it just skip some records for no reason. Not good.
So, why not just skip the whole file that I have imported previously? But for this, I need to get the last-modified date of the file I am processing.
Now I know there are some .Net methods for doing this in plain c# (Filesystem operations on Azure Data Lake Storage Gen1 using .NET SDK) but since I am using a script component of SSIS I already have a valid connection to the Data Lake. Should be much easier right?
But I couldn't find a way, I appreciate any guidance from here.
Cheers