0

I am creating table in Athena from data in s3. Here is short version of what query looks like.

CREATE EXTERNAL TABLE `tablename`(
  `licensee_pub` string COMMENT 'from deserializer', 
  `admin_number` string COMMENT 'from deserializer', 
  `account_name` string COMMENT 'from deserializer', 
  `ipi_number` string COMMENT 'from deserializer', 
  `title` string COMMENT 'from deserializer', 
  `name` string COMMENT 'from deserializer',
  `play_minutes` string COMMENT 'from deserializer', 
  `play_seconds` string COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ( 
  'escapeChar'='\\', 
  'separatorChar'=',') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'path/to/s3'
TBLPROPERTIES (
  'has_encrypted_data'='false')

files in S3 is comma separated and though table is created fine but there is issues with few lines having record lines like below in file

"a","b","c","d","abc ///def\\\","e","21","32"

Instances of line where "abc ///def\\" is not created properly, putting all data coming after this in one (title) field only like

abc ///def\",e,21,32

Probable escaping and not identifying column correctly.

Is there any way to have an exception for a text like def\\ to not escape, or any other workaround?

dtolnay
  • 9,621
  • 5
  • 41
  • 62
nik
  • 3,688
  • 3
  • 21
  • 33

1 Answers1

1

The escapeChar property in your DDL says the escape char is a single \ (backslashes in escapeChar need to be escaped, so two backslashes really means one).

This means that when the serde encounters "\\\" it will consume the first backslash as an escape for the second, and the third as escape for the ". Since the end quote is escaped the serde will continue reading until the next (unescaped) quote.

If you don't have actual escaped quotes you can just change the escapeChar to something else, and then decide in your application what \\\ means.

Theo
  • 131,503
  • 21
  • 160
  • 205
  • unfortunately I have actual escaped quotes in my string. I need to escape other characters with \ but the one in the end of string. – nik Jun 15 '20 at 14:48
  • I don't think you can solve that problem. Either `\"` means an escaped quote or it doesn't, you can't express that it only means escape in some contexts. – Theo Jun 15 '20 at 14:53