3

I am trying to import a CSV file into a Redshift database from a CSV file stored on S3. The CSV file is fully quoted (so even a fully integer column is quoted using ").

The table also contains doubles that are also quoted and use the Dutch way of separating the whole numbers and the decimal (1,5 is one and a half).

So basically, my file looks like this:

"int_key", "double_value"
"1", "2,50"
"2", "3,50"

If I import it 'plainly', I get the error:

Invalid digit, Value '"', Pos 0, Type: Integer

Then, I added removequotes to my query to get my integer integer processed properly. However, now my double column does not understand the comma:

Invalid digit, Value ',', Pos 1, Type: Double 

Is there any way to tell Redshift to use comma's to make doubles?

The file is too large to just regex through and replace the comma's with periods.

Jorick Spitzen
  • 1,559
  • 1
  • 13
  • 25
  • Similar to: [importing data with commas in numeric fields into redshift](http://stackoverflow.com/questions/39073102/importing-data-with-commas-in-numeric-fields-into-redshift) – John Rotenstein May 11 '17 at 20:57
  • Does your sample intentionally have a delimiter of a comma AND a space? That's going to cause problems, too. Or, does your data only have a comma as a delimiter? – John Rotenstein May 11 '17 at 21:56
  • Only comma. I typed the space as a force of habit I guess – Jorick Spitzen May 12 '17 at 07:04

1 Answers1

3

Your sample data contains a separator of a comma AND a space:

"int_key", "double_value"
"1", "2,50"
"2", "3,50"

This alone appears to prevent it from loading into Amazon Redshift.

However, if you only have a comma separator, it is possible, eg:

"int_key","double_value"
"1","2,50"
"2","3,50"

You could load the 2nd field as a string, then convert it using the REPLACE() function.

CREATE TABLE foo AS (i int, s string, f float);
COPY INTO foo FROM 's3://...' DELIMITER ',' IGNOREHEADER 1 REMOVEQUOTES;
UPDATE foo SET f = REPLACE(s, ',' , '.')::float;

It will take a bit of time to run the UPDATE command, but that's easier than modifying your input files.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470