1

I'm using external tables to load data from csv stored in a blob to a table in Azure SQL Data Warehouse. The csv uses a string delimiter (double quote), empty strings are represented as 2 double quotes ("").

I want the empty columns to be treated as NULL in the table. The external file format I use is set up with USE_TYPE_DEFAULT = FALSE, but this does not seem to work since empty columns are imported as empty strings. And this only tends to happen when the columns are strings, numeric columns are correctly converted to NULL.

I'm also importing a different csv which does not have a string delimiter using a different external file format and those empty columns are imported as NULL. So it looks like it has something to do with the STRING_DELIMITER option.

The csv:

col1;col2;col3;col4;col5;col6
"a";"b";"c";"1";"2";"3"
"d";"";"f";"4";"";"6"

The code of the external file format:

CREATE EXTERNAL FILE FORMAT eff_string_del
WITH (
        FORMAT_TYPE = DELIMITEDTEXT
        ,FORMAT_OPTIONS(
            FIELD_TERMINATOR = ';'
            ,STRING_DELIMITER = '0x22'
            ,FIRST_ROW = 2
            ,USE_TYPE_DEFAULT = False)
    )

Code of the table using the external file format:

CREATE EXTERNAL TABLE dbo.test (
col1 varchar(1) null
,col2 varchar(1) null
,col3 varchar(1) null
,col4 int null
,col5 int null
,col6 int null
)
WITH (
        DATA_SOURCE = [EDS]
        ,LOCATION = N'test.csv'
        ,FILE_FORMAT = eff_string_del
        ,REJECT_TYPE = VALUE
        ,REJECT_VALUE = 0
        )

The result when querying the external table:

SELECT *
FROM [dbo].[test]

col1 col2 col3 col4        col5        col6
---- ---- ---- ----------- ----------- -----------
a    b    c    1           2           3
d         f    4           NULL        6

Can someone please help me explain what is happening or what I'm doing wrong?

Ward
  • 39
  • 7
  • Is `UPDATE test SET col2=NULL WHERE col2 = ''` an option? I rarely insert a file straight into the target table. I always stage it first and clean / process it. – Nick.Mc Jan 22 '19 at 09:40

2 Answers2

1

Use USE_TYPE_DEFAULT = False in external file format. Any NULL values that are stored by using the word NULL in the delimited text file are imported as the string 'NULL'.

For example:

CREATE EXTERNAL FILE FORMAT example_file_format
WITH (FORMAT_TYPE = DELIMITEDTEXT,
      FORMAT_OPTIONS(
          FIELD_TERMINATOR = ',',
          STRING_DELIMITER = '"',
          FIRST_ROW = 2, 
          USE_TYPE_DEFAULT = False)
)

Reference : https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=sql-server-2017

Bukhbayar
  • 31
  • 4
  • Thank you for your answer but as you can see in my original message, I'm already using the `USE_TYPE_DEFAULT = False` in my external file format. – Ward Aug 28 '19 at 11:44
  • Yes, If you want to convert 'NULL' text into NULL values, you need to use ```USE_TYPE_DEFAULT = True``` However, you need to keep in mind that empty field with INT type becomes 0, empty field with varchar becomes empty string instead of NULL. So, you need to choose. – Bukhbayar Aug 30 '19 at 07:01
0

Have you considered adding the value NULL in that field instead of ""?

See below a test I've performed using the following code:

declare @mytable table
(id int identity primary key, column1 varchar(100))

insert into @mytable (column1)  values ('test1')
insert into @mytable (column1)  values ('test2')
insert into @mytable (column1)  values (null)
insert into @mytable (column1)  values ('test3')
insert into @mytable (column1)  values (null)

select
*
from    @mytable

The results looks like this:

enter image description here

Would this work for you?

Attie Wagner
  • 1,312
  • 14
  • 28
  • Thank you @Birel, but the thing here is that the records are inserted with Polybase by using an external table and not with a simple insert statement. – Ward Dec 11 '18 at 08:50
  • O I see, but can it not be changed in `Polybase` with the insert? – Attie Wagner Dec 11 '18 at 09:09