I want to load data to amazon redshift external table. Data is in CSV format and has quotes. Do we have something like REMOVEQUOTES which we have in copy command for redshift external tables. Also what are different options to load fixed length data in external table.
Asked
Active
Viewed 1.0k times
4

John Rotenstein
- 241,921
- 22
- 380
- 470

SauravT
- 43
- 1
- 1
- 3
-
1As of right now, there is unfortunately no property to remove the quotes as the OpenCSV SerDe variant is not supported yet. – grundprinzip Jul 07 '17 at 22:36
2 Answers
9
To create an external Spectrum table, you should reference the CREATE TABLE
syntax provided by Athena. To load a CSV escaped by double quotes, you should use the following lines as your ROW FORMAT
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '\"',
'escapeChar' = '\\'
)
For fixed length files, you should use the RegexSerDe. In this case, the relevant portion of your CREATE TABLE
statement will look like this (assuming 3 fields of length 100).
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "(.{100})(.{100})(.{100})")

Andrew Jones
- 1,382
- 10
- 26
0
You can also use regex to parse data enclosed by multiple characters. Example (in CSV file, fields were surrounded by triple double-quotes (""")):
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.AbstractSerDe'
WITH SERDEPROPERTIES (
'input.regex' = "^\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*,\"*([^\"]*)\"*$" )
)

Franke
- 1,234
- 12
- 14