2

I'm working through bulk loading information into Snowflake using Python. I use pandas to extract and load data from a source system into a csv. The issue occurs when copying data into Snowflake.

I receive the below error:

snowflake.connector.errors.ProgrammingError: 100065 (22000): Found character '4' instead of field delimiter ','

The field is hanging on the following text enclosed within a field.

"[**EXTERNAL**]: BID: ""4597290 - PERSONNE"

As you can see there are double quotes lined up just before the 4597290 that are causing an escape. Being unversed in how to escape out of this particular issue but what i've successfully used for other files is below:

'file_format = (type = "csv" SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '"' ESCAPE = ' ')

Is there a file format that will allow me to accommodate the above example or is the best practice remove quotes on this field alone?

toobeyonewing
  • 25
  • 1
  • 5
  • If TAB character is not occurring in your data, you could use TAB delimited files (FIELD_DELIMITER = '\t'). Please check this other question which I think provides solutions to your problem too: https://stackoverflow.com/questions/65986126/how-to-create-a-csv-file-format-definition-to-load-data-into-snowflake-table/66398797#66398797 – Francesco Quaratino Mar 09 '21 at 20:13

1 Answers1

1

There's a few options I can think of plus you can look through the forums for other ideas (i.e., how to process double quotes within double quotes fields).

  1. You can pre-process the data. This may include removing the double quotes, potentially escaping it with another pair of double quotes, or using a different set of field delimiters or field enclosure.

  2. You can load the data in with all the double quotes included by removing the FIELD_OPTIONALLY_ENCLOSED_BY field and post-processing later as needed. Example below using your existing file format and removing that field.

COPY INTO FORUMTEST from @TEST/forumtest.csv.gz file_format = (type = CSV ESCAPE = ' ');

  1. You can do a transformative SQL statement during the COPY and remove the FIELD_OPTIONALLY_ENCLOSED_BY field. SQL statement below supposing we had 3 columns with the double-quoted data, just as an example.

COPY into FORUMTEST from (select replace($1,'"',''), replace($2,'"',''), replace($3,'"','') from @TEST/forumtest.csv.gz file_format) file_format=(type = CSV ESCAPE = ' '));

Suzy Lockwood
  • 1,050
  • 4
  • 6
  • I'm working through each of your answer's one by one. I have been testing option 3 and, of course, it's not liking it. My hope is that i can accomplish this by using a suffix in snowflake. Thank you for all the help! I'll let you know what i find. – toobeyonewing Mar 15 '21 at 16:26
  • Sounds good. This was just an example as your data is going to differ (it worked based on the one piece of sample data you provided), but you can use this, REGEXP_REPLACE, or other functions that may work better for you. – Suzy Lockwood Mar 15 '21 at 16:49