I am trying to shift data from Snowflake to Postgresql and to do so I first load it into s3 in CSV format. In the table, comas in text could appear, I therefore use FIELD_OPTIONALLY_ENCLOSED_BY
snowflake unloading option to quote the content of the problematic cells. However when this happen + null values, I can't manage to have a valid CSV for PostgreSQL.
I created a simple table for you to understand the issue. Here it is :
CREATE OR REPLACE TABLE PUBLIC.TEST(
TEXT_FIELD VARCHAR(),
NUMERIC_FIELD INT
);
INSERT INTO PUBLIC.TEST VALUES
('A', 1),
(NULL, 2),
('B', NULL),
(NULL, NULL),
('Hello, world', NULL)
;
COPY INTO @STAGE/test
FROM PUBLIC.TEST
FILE_FORMAT = (
COMPRESSION = NONE,
TYPE = CSV,
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ''
)
OVERWRITE = TRUE;
Snowflake will from that create the following CSV
"A",1
"",2
"B",""
"",""
"Hello, world",""
But after that, it is for me impossible to copy this CSV inside a PostgreSQL Table as it is.
Even thought from PostgreSQL documentation we have next to NULL option :
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format.
Not setting COPY Option in PostgreSQL COPY INTO will result in a failed unloading. Indeed it won't work as we also have to specify the quote used using QUOTE. Here it'll be QUOTE '"'
Therefore during POSTGRESQL unloading, using :
FORMAT csv, HEADER false, QUOTE '"'
will give :
DataError: invalid input syntax for integer: "" CONTEXT: COPY test, line 3, column numeric_field: ""
FORMAT csv, HEADER false, NULL '""', QUOTE '"'
will give :
NotSupportedError: CSV quote character must not appear in the NULL specification
FYI, To test the unloading in s3 I will use this command in PostgreSQL:
CREATE IF NOT EXISTS TABLE PUBLIC.TEST(
TEXT_FIELD VARCHAR(),
NUMERIC_FIELD INT
);
CREATE EXTENSION IF NOT EXISTS aws_s3 CASCADE;
SELECT aws_s3.table_import_from_s3(
'PUBLIC.TEST',
'',
'(FORMAT csv, HEADER false, NULL ''""'', QUOTE ''"'')',
'bucket',
'test_0_0_0.csv',
'aws_region'
)
Thanks a lot for any ideas on what I could do to make it happen? I would love to find a solution that don't requires modifying the csv between snowflake and postgres. I think it is an issue more on the Snowflake side as it don't really make sense to quote null values. But PostgreSQL is not helping either.