2

I am trying to shift data from Snowflake to Postgresql and to do so I first load it into s3 in CSV format. In the table, comas in text could appear, I therefore use FIELD_OPTIONALLY_ENCLOSED_BY snowflake unloading option to quote the content of the problematic cells. However when this happen + null values, I can't manage to have a valid CSV for PostgreSQL.
I created a simple table for you to understand the issue. Here it is :

CREATE OR REPLACE TABLE PUBLIC.TEST(
    TEXT_FIELD VARCHAR(),
    NUMERIC_FIELD INT
);
INSERT INTO PUBLIC.TEST VALUES
    ('A', 1),
    (NULL, 2),
    ('B', NULL),
    (NULL, NULL),
    ('Hello, world', NULL)
;
COPY INTO @STAGE/test
FROM PUBLIC.TEST
FILE_FORMAT = (
    COMPRESSION = NONE,
    TYPE = CSV,
    FIELD_OPTIONALLY_ENCLOSED_BY = '"'
    NULL_IF = ''
)
OVERWRITE = TRUE;

Snowflake will from that create the following CSV

"A",1
"",2
"B",""
"",""
"Hello, world",""

But after that, it is for me impossible to copy this CSV inside a PostgreSQL Table as it is.
Even thought from PostgreSQL documentation we have next to NULL option :

Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format.

Not setting COPY Option in PostgreSQL COPY INTO will result in a failed unloading. Indeed it won't work as we also have to specify the quote used using QUOTE. Here it'll be QUOTE '"'

Therefore during POSTGRESQL unloading, using :
FORMAT csv, HEADER false, QUOTE '"' will give :

DataError: invalid input syntax for integer: "" CONTEXT:  COPY test, line 3, column numeric_field: ""

FORMAT csv, HEADER false, NULL '""', QUOTE '"' will give :

NotSupportedError: CSV quote character must not appear in the NULL specification

FYI, To test the unloading in s3 I will use this command in PostgreSQL:

CREATE IF NOT EXISTS TABLE PUBLIC.TEST(
  TEXT_FIELD VARCHAR(),
  NUMERIC_FIELD INT
);
CREATE EXTENSION IF NOT EXISTS aws_s3 CASCADE;
SELECT aws_s3.table_import_from_s3(
'PUBLIC.TEST',
'',
'(FORMAT csv, HEADER false, NULL ''""'', QUOTE ''"'')',
'bucket',
'test_0_0_0.csv',
'aws_region'
)

Thanks a lot for any ideas on what I could do to make it happen? I would love to find a solution that don't requires modifying the csv between snowflake and postgres. I think it is an issue more on the Snowflake side as it don't really make sense to quote null values. But PostgreSQL is not helping either.

Mike Walton
  • 6,595
  • 2
  • 11
  • 22
Victor
  • 23
  • 6

1 Answers1

3

When you set the NULL_IF value to '', you are actually telling Snowflake to convert NULLS to a BLANK, which then get quoted. When you are copying out of Snowflake, the copy options are "backwards" in a sense and NULL_IF acts more like an IFNULL.

This is the code that I'd use on the Snowflake side, which will result in an unquoted empty string in your CSV file:

FILE_FORMAT = (
    COMPRESSION = NONE,
    TYPE = CSV,
    FIELD_OPTIONALLY_ENCLOSED_BY = '"'
    NULL_IF = ()
)
Mike Walton
  • 6,595
  • 2
  • 11
  • 22