How to only read specific file from External Table that is pointing to a Folder in ADLS that has thousands of file ?
Asked
Active
Viewed 266 times
3
-
3You can't. Recreate the external table pointing to that specific file in the location property. – GregGalloway Dec 29 '17 at 01:05
-
4Vote for this feedback item: https://feedback.azure.com/forums/307516-sql-data-warehouse/suggestions/13749543-function-that-returns-the-name-of-the-current-file – GregGalloway Dec 29 '17 at 01:07
1 Answers
0
You can't do that with external tables / Polybase when the external table has already been created, but what you could do is create your own external table specifying the filename in the definition. eg if you table definition is like this (where a filename is not specified):
CREATE EXTERNAL TABLE ext.LINEITEM (
L_ORDERKEY BIGINT NOT NULL,
...
)
WITH (
LOCATION = 'input/lineitem/',
DATA_SOURCE = AzureDataLakeStore,
FILE_FORMAT = TextFileFormat
);
You could copy it and your own table, eg
CREATE EXTERNAL TABLE ext.LINEITEM_42 (
L_ORDERKEY BIGINT NOT NULL,
...
)
WITH (
LOCATION = 'input/lineitem/lineitem42.txt',
DATA_SOURCE = AzureDataLakeStore,
FILE_FORMAT = TextFileFormat
);
See the difference? Another alternative would be to use one of the languages / platforms that can easily access data lake eg U-SQL, Databricks to write a query accessing the lake, eg a little U-SQL:
@input =
EXTRACT
l_orderkey int
...
FROM "/input/lineitem/lineitem42.txt"
USING Extractors.Csv(skipFirstNRows : 1 );
A little Scala:
val lineitem42 = "/mnt/lineitem/lineitem42.txt"
var df42 = spark.read
.option("sep", "|") // Use pipe separator
.csv(lineitem42)

wBob
- 13,710
- 3
- 20
- 37