3

How to only read specific file from External Table that is pointing to a Folder in ADLS that has thousands of file ?

GregGalloway
  • 11,355
  • 3
  • 16
  • 47
NSS
  • 1,835
  • 2
  • 29
  • 66
  • 3
    You can't. Recreate the external table pointing to that specific file in the location property. – GregGalloway Dec 29 '17 at 01:05
  • 4
    Vote for this feedback item: https://feedback.azure.com/forums/307516-sql-data-warehouse/suggestions/13749543-function-that-returns-the-name-of-the-current-file – GregGalloway Dec 29 '17 at 01:07

1 Answers1

0

You can't do that with external tables / Polybase when the external table has already been created, but what you could do is create your own external table specifying the filename in the definition. eg if you table definition is like this (where a filename is not specified):

CREATE EXTERNAL TABLE ext.LINEITEM (
    L_ORDERKEY      BIGINT          NOT NULL,
    ...
    )
WITH (
    LOCATION = 'input/lineitem/',
    DATA_SOURCE = AzureDataLakeStore, 
    FILE_FORMAT = TextFileFormat
);

You could copy it and your own table, eg

CREATE EXTERNAL TABLE ext.LINEITEM_42 (
    L_ORDERKEY      BIGINT          NOT NULL,
    ...
    )
WITH (
    LOCATION = 'input/lineitem/lineitem42.txt',
    DATA_SOURCE = AzureDataLakeStore, 
    FILE_FORMAT = TextFileFormat
);

See the difference? Another alternative would be to use one of the languages / platforms that can easily access data lake eg U-SQL, Databricks to write a query accessing the lake, eg a little U-SQL:

@input = 
EXTRACT 
    l_orderkey int
    ...

FROM "/input/lineitem/lineitem42.txt"
USING Extractors.Csv(skipFirstNRows : 1 );

A little Scala:

val lineitem42 = "/mnt/lineitem/lineitem42.txt"

var df42 = spark.read       
  .option("sep", "|")        // Use pipe separator
  .csv(lineitem42)
wBob
  • 13,710
  • 3
  • 20
  • 37