I am moving data within folder from Azure Data Lake to a SQL Server using Azure Data Factory (ADF).
The folder contains hundreds of .csv files. However, one inconsistent problem with these csv's is that some (not all) have a final row that contains a special character, which when trying to load to a sql table with datatypes other than NVARCHAR(MAX) will fail. To get around this, I have to first use ADF to load the data into staging tables where all columns are set to NVARCHAR(MAX), then I insert those rows that do not contain a special character into tables that have the appropriate data type.
This is a weekly process, and is over a terabyte of data and it takes forever to move the data so I am looking into ways to import into my final tables rather than having a staging component.
I notice that there is a 'pre-copy script' field that can execute before the load to sql server. I want to add code that will allow me to parse out special characters OR null rows before loading to sql server.
I am unsure of how to approach this since the csv's would not be stored in a table, so SQL code wouldn't work. Any guidance on how I can utilize the pre-copy script to clean my data before loading it into sql server?