4

I want to load data to amazon redshift external table. Data is in CSV format and has quotes. Do we have something like REMOVEQUOTES which we have in copy command for redshift external tables. Also what are different options to load fixed length data in external table.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
SauravT
  • 43
  • 1
  • 1
  • 3
  • 1
    As of right now, there is unfortunately no property to remove the quotes as the OpenCSV SerDe variant is not supported yet. – grundprinzip Jul 07 '17 at 22:36

2 Answers2

9

To create an external Spectrum table, you should reference the CREATE TABLE syntax provided by Athena. To load a CSV escaped by double quotes, you should use the following lines as your ROW FORMAT

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
    'separatorChar' = ',',
    'quoteChar' = '\"',
    'escapeChar' = '\\'
)

For fixed length files, you should use the RegexSerDe. In this case, the relevant portion of your CREATE TABLE statement will look like this (assuming 3 fields of length 100).

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "(.{100})(.{100})(.{100})")
Andrew Jones
  • 1,382
  • 10
  • 26
0

You can also use regex to parse data enclosed by multiple characters. Example (in CSV file, fields were surrounded by triple double-quotes (""")):

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.AbstractSerDe'
WITH SERDEPROPERTIES (
    'input.regex' = "^\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*$"  ) 
) 
Franke
  • 1,234
  • 12
  • 14