4

I have a data-set in S3

123, "some random, text", "", "", 236

I build a external table on this dataset :

CREATE EXTERNAL TABLE db1.myData(
    field1 bigint, 
    field2 string, 
    field3 string, 
    field4 string, 
    field5 bigint, 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','
ESCAPED BY '\\' 
LOCATION 's3n://thisMyData/';

Problem/ Issue :
when I do select * from db1.myData

field2 is shown as

some random

I need the field to be

some random, text

Gotcha's :
1. I cannot change the delimiter as there are over ~300 .csv files at this location
2. ESCAPED BY is not escaping the '\\'
3. I'm using HIVE 0.13 so there I cannot use CSV SerDe and neither i'm allowed to import new jars to cluster (its a complicated process to add a new jar as I have to go through Director level approvals)

Question:

  • Is there a workaround for making 'ESCAPED BY' come alive ?!
  • Any other workarounds for this ??

All suggestions are welcome !!

N.B : THis is not a repeat question. If you think its a repeat, please guide me to right page and I will take this off of this portal :)

Community
  • 1
  • 1
underwood
  • 845
  • 2
  • 11
  • 22
  • did you tried '\054' ? http://www.cloudera.com/documentation/archive/impala/2-x/2-0-x/topics/impala_create_table.html – n1tk May 07 '16 at 01:39
  • 1
    Note: The CREATE TABLE clauses FIELDS TERMINATED BY, ESCAPED BY, and LINES TERMINATED BY have special rules for the string literal used for their argument, because they all require a single character. You can use a regular character surrounded by single or double quotation marks, an octal sequence such as '\054' (representing a comma), or an integer in the range -127..128 (without quotation marks or backslash), which is interpreted as a single-byte ASCII character ... – n1tk May 07 '16 at 01:40
  • Thanks for your tip @sb0709.. I tried `FIELDS TERMINATED BY '\054' ESCAPED BY '#' ` and `FIELDS TERMINATED BY '\054' ESCAPED BY '\\' ` but neither of them worked. I'm dropping the table before recreating them. Another information I wish add here.. my metastore is S3. Please let me know if I'm missing something here – underwood May 07 '16 at 03:10
  • I would also like to add that in `some random, text` there is a single space between , and text `some random, _space_ text` – underwood May 07 '16 at 03:17
  • Did you ever solve this? If so, sharing what you found would be helpful. – Barry McNamara Jun 29 '18 at 16:43

2 Answers2

2

I had to use: ESCAPED BY '\134' which translates to: ESCAPED BY '\'.

Additionally, because I was calling the Athena create table statement by passing in the statement from a JSON file I had to add an extra \ to mask the original \ in JSON. So my final statement within the JSON file looked like this: ESCAPED BY '\\134'.

Robert Penridge
  • 8,424
  • 2
  • 34
  • 55
0

If you are using Hive 0.14, you can use CSV Serde like this:

CREATE TABLE my_table(a string, b string, ...)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = "\t",
   "quoteChar"     = "'",
   "escapeChar"    = "\\"
)  
STORED AS TEXTFILE;

Refer below link for details:

https://cwiki.apache.org/confluence/display/Hive/CSV+Serde

Nishu Tayal
  • 20,106
  • 8
  • 49
  • 101