1

I'm trying to create an external table to a CSV file stored in Azure Storage.

The CSV data looks like this :-

Date Rail Period Calendar Year Calendar Month Calendar Month Name Fiscal Year Fiscal Period Weekday Weekday Number
26/04/2021 2201 2021 4 April 2022 Period 1 Monday 1
27/04/2021 2201 2021 4 April 2022 Period 1 Tuesday 2
28/04/2021 2201 2021 4 April 2022 Period 1 Wednesday 3
29/04/2021 2201 2021 4 April 2022 Period 1 Thursday 4
30/04/2021 2201 2021 4 April 2022 Period 1 Friday 5
01/05/2021 2201 2021 5 May 2022 Period 2 Saturday 6
02/05/2021 2202 2021 5 May 2022 Period 2 Sunday 7
03/05/2021 2202 2021 5 May 2022 Period 2 Monday 1
04/05/2021 2202 2021 5 May 2022 Period 2 Tuesday 2

I've created the External File Format using the following code

CREATE EXTERNAL FILE FORMAT csvFile
WITH (
    FORMAT_TYPE = DELIMITEDTEXT,
    FORMAT_OPTIONS (
      FIELD_TERMINATOR = ',',
      STRING_DELIMITER = '"',
      FIRST_ROW = 2,
      USE_TYPE_DEFAULT = TRUE,
      ENCODING = 'UTF8' )
);

and the External table as follows

CREATE EXTERNAL TABLE ext.DateDimension(
    [Date]                DATE, 
    [Rail Period]         INT, 
    [Calendar Year]       INT, 
    [Calendar Month]      INT,
    [Calendar Month Name] VARCHAR(9),
    [Fiscal Year]         INT, 
    [Fiscal Period]       VARCHAR(9), 
    [Weekday]             VARCHAR(9), 
    [Weekday Number]      INT)
WITH(
    DATA_SOURCE = [tfwpbstore_ADLSG2], 
    LOCATION = '/Generic Datasets/Date Dimension.csv',
    FILE_FORMAT = csvFile);

However, when I try to SELECT from the external table it gives me the following error

Msg 107090, Level 16, State 1, Line 1 HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopSqlException: Error converting data type VARCHAR to DATETIME.

and I'm not quite sure what's wrong. If anyone could help, I'd be most grateful.

  • `'26/04/2021'` is a language specific date format; specifically it'll likely only work if your language setting is set to `BRITISH`. I suspect Synapse or Polybase dislikes that as it's probably working in American `ENGLISH`. if you can ensure your value is language agnostic, you'd be better off. I.e. `yyyyMMdd` or `yyyy-MM-dd` (the latter is only agnostic for the new date and type data types). – Thom A May 26 '21 at 15:36
  • Prior to loading the CSV, you can SET DATEFORMAT DMY – John Cappelletti May 26 '21 at 15:39
  • … https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=sql-server-ver15&tabs=delimited#date_format – lptr May 26 '21 at 15:58
  • @Larnu @lptr Thank you very much all - I included `DATE_FORMAT = 'dd/MM/yyyy'` and it worked – Richard Cooper May 27 '21 at 08:51

1 Answers1

1

As mentioned in the comments above I needed to define the date format used in the file format statement as follows:

CREATE EXTERNAL FILE FORMAT csvFile_ddMMyyyy_fr2
WITH (
    FORMAT_TYPE = DELIMITEDTEXT,
    FORMAT_OPTIONS (
      FIELD_TERMINATOR = ',',
      STRING_DELIMITER = '"',
     DATE_FORMAT = 'dd/MM/yyyy',
      FIRST_ROW = 2,
      USE_TYPE_DEFAULT = TRUE,
      ENCODING = 'UTF8' )
);