I am creating table in Athena from data in s3. Here is short version of what query looks like.
CREATE EXTERNAL TABLE `tablename`(
`licensee_pub` string COMMENT 'from deserializer',
`admin_number` string COMMENT 'from deserializer',
`account_name` string COMMENT 'from deserializer',
`ipi_number` string COMMENT 'from deserializer',
`title` string COMMENT 'from deserializer',
`name` string COMMENT 'from deserializer',
`play_minutes` string COMMENT 'from deserializer',
`play_seconds` string COMMENT 'from deserializer')
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'escapeChar'='\\',
'separatorChar'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'path/to/s3'
TBLPROPERTIES (
'has_encrypted_data'='false')
files in S3 is comma separated and though table is created fine but there is issues with few lines having record lines like below in file
"a","b","c","d","abc ///def\\\","e","21","32"
Instances of line where "abc ///def\\" is not created properly, putting all data coming after this in one (title) field only like
abc ///def\",e,21,32
Probable escaping and not identifying column correctly.
Is there any way to have an exception for a text like def\\ to not escape, or any other workaround?