6

I am new to Azure and Polybase, I am trying to read a CSV file into a SQL External Table.

I noticed that, it is not possible to skip the first row, the header on some forums I read.

I'm hoping for the opposite,Can you help me ?

The code I used is below.

Thanks in advance

 CREATE EXTERNAL TABLE dbo.Test2External (
   [Guid] [varchar](36) NULL,
    [Year] [smallint] NULL,
    [SysNum] [bigint] NULL,
    [Crc_1] [decimal](15, 2) NULL,
    [Crc_2] [decimal](15, 2) NULL,
    [Crc_3] [decimal](15, 2) NULL,
    [Crc_4] [decimal](15, 2) NULL,
    [CreDate] [date] NULL,
    [CreTime] [datetime] NULL,
    [UpdDate] [date] NULL,
    ...
WITH (
    LOCATION='/20160823/1145/FIN/',
    DATA_SOURCE=AzureStorage,
    FILE_FORMAT=TextFile
);


-- Run a query on the external table

SELECT count(*) FROM dbo.Test2External;
TheGameiswar
  • 27,855
  • 8
  • 56
  • 94
laurens
  • 497
  • 14
  • 28

3 Answers3

10

there is a workaround by using 'EXTERNAL FILE FORMAT' with 'FIRST_ROW = 2'. e.g. if we create a file format

CREATE EXTERNAL FILE FORMAT [CsvFormatWithHeader] WITH (
    FORMAT_TYPE = DELIMITEDTEXT, 
    FORMAT_OPTIONS (
        FIELD_TERMINATOR = ',', 
        FIRST_ROW  = 2,
        STRING_DELIMITER = '"',
        USE_TYPE_DEFAULT = False
        )
)
GO

And then use this file format with create external table

CREATE EXTERNAL TABLE [testdata].[testfile1]
(
    [column1] [nvarchar](4000) NULL
)
WITH (DATA_SOURCE = data_source,
LOCATION = file_location,
FILE_FORMAT = [CsvFormatWithHeader],REJECT_TYPE = PERCENTAGE,REJECT_VALUE = 100,REJECT_SAMPLE_VALUE = 1000)

It will skip first row while executing queries for 'testdata.testfile1'.

Rizwan Hanif
  • 153
  • 1
  • 8
4

You have a few options:

  1. get the file headers removed permanently because Polybase isn't really meant to work with file headers
  2. Use Azure Data Factory which does have options for skipping header rows when the file is in Blob storage
  3. set the rejection options of the Polybase table to try and ignore the header row, ie setREJECT_TYPE to VALUE and the REJECT_VALUE to 1, eg this is a bit hacky as you don't have any control over whether or not this is actually the header row, but it would work if you only have one header row and it is the only error in the file. Example below.

For a file called temp.csv with this content:

a,b,c
1,2,3
4,5,6

A command like this will work:

CREATE EXTERNAL TABLE dbo.mycsv (
    colA INT NOT NULL,
    colB INT NOT NULL,
    colC INT NOT NULL
)
WITH (
    DATA_SOURCE = eds_esra,
    LOCATION = N'/temp.csv',
    FILE_FORMAT = eff_csv,
    REJECT_TYPE = VALUE,
    REJECT_VALUE = 1
    )
GO

SELECT *
FROM dbo.mycsv

My results:

Results

  1. set the datatypes of the external table to VARCHAR just for staging the data then remove the header row when converting to an internal table using something like ISNUMERIC, eg

    CREATE EXTERNAL TABLE dbo.mycsv2 (
        colA VARCHAR(5) NOT NULL,
        colB VARCHAR(5) NOT NULL,
        colC VARCHAR(5) NOT NULL
    )
    WITH (
        DATA_SOURCE = eds_esra,
        LOCATION = N'/temp.csv',
        FILE_FORMAT = eff_csv,
        REJECT_TYPE = VALUE,
        REJECT_VALUE = 0
        )
    GO
    
    
    CREATE TABLE dbo.mycsv3
    WITH (
        CLUSTERED INDEX ( colA ),
        DISTRIBUTION = ROUND_ROBIN
        )
    AS
    SELECT
        colA,
        colB,
        colC
    FROM dbo.mycsv2
    WHERE ISNUMERIC( colA ) = 1
    GO
    

HTH

wBob
  • 13,710
  • 3
  • 20
  • 37
1

Skip header rows on SQL Data Warehouse PolyBase load

Delimited text files are often created with a header row that contains the column names. These rows need to be excluded from the data set during the load. Azure SQL Data Warehouse users can now skip these rows by using the First_Row option in the delimited text file format for PolyBase loads. The First_Row option defines the first row that is read in every file loaded. By setting the value to 2, you effectively skip the header row for all files. For more information, see the documentation for the CREATE EXTERNAL FILE FORMAT statement.

Praneet
  • 11
  • 1