1

Step#1: WE are supposed to copy the CSV Files from On-Premise File Server to Azure Blob Storage (say - 'Staging' Container in Blob Storage). Step#2: Applying Polybase, we will load these files data to Azure SQL Datawarehouse.

We are maintaining the same file name (sync with the Staging DB Tables), every time it loads to Azure Blob from On-Prem file server. We are facing challenge while loading data to Azure Datawarehouse from blob storage as during each batch cycle execution(using ADF pipeline run), we have to process & load all the files from staging to Azure SQL DWH. We are running 4 batch cycle every day. For each cycle, we are processing the latest files as well as the old files which are already processed. Is there any way, we can only load the currently available files at On-prem file server for each individual batch job. (I mean, we will load these files to staging & will process only these files to sql dwh without touching others).

Koushik
  • 11
  • 3

1 Answers1

0

Same issue occurred with me. What I did was added a column ExtractDate in CSV file and then selected only those records from PolyBase for the ExtractDate I want. Currently PolyBase doesn't support delta file detection from blob. So, this workaround worked for me.

Pratik Somaiya
  • 695
  • 5
  • 18