I have data in a blob storage account, which is organized in the following structure:
/logs/YYYY/mm/dd/HH
Let's say there is a few years of data and I want to pull data from a specific time period (or time periods) into an Azure Data Warehouse table.
How does the following differ in terms of how Polybase retrieves the data:
1) Create table as Select from External table pointing to the root folder (/logs) with a where clause which takes into account a range of values for a particular time value (e.g Timestamp between x and y).
2) A CTAS from a UNION of external tables which point to the individual folders that make up the specific time periods I want.
I assume 2) above will be quicker, because Polybase doesn't have to look in any unwanted folder? Is this correct?
If so, is there a way to select from an external table which points to a root folder, but in some way only selecting files from a specific time period?
Perhaps there's some documentation that explains this?