Amazon Redshift - COPY from CSV - single Double Quote in row - Invalid quote formatting for CSV Error

Question

I'm loading a CSV file from S3 into Redshift. This CSV file is analytics data which contains the PageUrl (which may contain user search info inside a query string for example).

It chokes on rows where there is a single, double-quote character, for example if there is a page for a 14" toy then the PageUrl would contain:

http://www.mywebsite.com/a-14"-toy/1234.html

Redshift understandably can't handle this as it is expecting a closing double quote character.

The way I see it my options are:

Pre-process the input and remove these characters
Configure the COPY command in Redshift to ignore these characters but still load the row
Set MAXERRORS to a high value and sweep up the errors using a separate process

Option 2 would be ideal, but I can't find it!

Any other suggestions if I'm just not looking hard enough?

Thanks

Duncan

I know this is a year old, but any updates where you can find an answer for #2? — mlh351, Jul 06 '16 at 21:54

score 16 · Answer 1 · answered Sep 26 '17 at 07:14

It's 2017 and I run into the same problem, happy to report there is now a way to get redshift to load csv files with the odd " in the data.

The trick is to use the ESCAPE keyword, and also to NOT use the CSV keyword. I don't know why, but having the CSV and ESCAPE keywords together in a copy command resulted in failure with the error message "CSV is not compatible with ESCAPE;" However with no change to the loaded data I was able to successfully load once I removed the CSV keyword from the COPY command.

You can also refer to this documentation for help: http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html#copy-escape

Thanks a lot for your response, this saved my day 2 years later! — Edgars T., Dec 02 '19 at 16:13
`FORMAT AS CSV` only can handle newline characters in the file - how can we handle that? — Parvathirajan Natarajan, Oct 28 '20 at 05:02

score 11 · Accepted Answer · answered Aug 07 '15 at 10:21

11

Unfortunately, there is no way to fix this. You will need to pre-process the file before loading it into Amazon Redshift.

The closest options you have are CSV [ QUOTE [AS] 'quote_character' ] to wrap fields in an alternative quote character, and ESCAPE if the quote character is preceded by a slash. Alas, both require the file to be in a particular format before loading.

See:

answered Aug 07 '15 at 10:21

John Rotenstein

241,921
22
380
470

1

Thanks John. Yeah I've resigned myself to writing a little PIG job to precede all " with \" which should do the trick. – Duncan Aug 08 '15 at 12:13
Depending upon the size of your data, you could just run things through `sed`. You could even do it as a streaming job through `aws s3 cp s3://b/f - | sed xxx | aws s3 cp - s3://b/f2`. See streaming on https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html – John Rotenstein Aug 09 '15 at 05:55
@ayeletd 's answer below provides a solution without pre-processing – MyStackRunnethOver Oct 07 '20 at 20:15

score -1 · Answer 3 · answered Sep 14 '22 at 07:43

-1

I have done this using ---> DELIMITER ',' IGNOREHEADER 1; at the replacement for 'CSV' at the end of COPY command. Its working really fine.

answered Sep 14 '22 at 07:43

Muhammad Ashir Ali

69
8

Amazon Redshift - COPY from CSV - single Double Quote in row - Invalid quote formatting for CSV Error

3 Answers3