4

I have a csv file which has data in below format:

"SomeName1",25,"SomeString1"
"SomeName2",26,"SomeString2"
"SomeName3",27,"SomeString3"

I am loading this CSV into a hive table. In the table, column 1 and 3 get inserted together with the quotes which I do not want. I want column 1 to be SomeName1 and column 3 to be SomeString1

I have tried with

WITH SERDEPROPERTIES (
   "separatorChar" = "\t",
   "quoteChar"     = "\""
)

but it does not work and keeps the "".

What should be the approach here?

Table creation statement:

CREATE TABLE `abcdefgh`(
  `name` string COMMENT 'from deserializer',
  `age` string COMMENT 'from deserializer',
  `value` string COMMENT 'from deserializer')
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
  'quoteChar'='\"',
  'separatorChar'='\t')
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://a-b-c-d-e:9000/user/hive/warehouse/abcdefgh'
TBLPROPERTIES (
  'numFiles'='1',
  'numRows'='0',
  'rawDataSize'='0',
  'totalSize'='3134916',
  'transient_lastDdlTime'='1490713221')
earl
  • 738
  • 1
  • 17
  • 38
  • Which SERDE are you using ? Please post the complete table creation query – cheseaux Mar 28 '17 at 15:07
  • ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' – earl Mar 28 '17 at 15:11
  • CREATE TABLE `abcdefgh`( `name` string COMMENT 'from deserializer', `age` string COMMENT 'from deserializer', `value` string COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( 'quoteChar'='\"', 'separatorChar'='\t') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://a-b-c-d-e:9000/user/hive/warehouse/abcdefgh' TBLPROPERTIES ( 'numFiles'='1', 'numRows'='0', 'rawDataSize'='0', ,) – earl Mar 28 '17 at 15:12
  • It would be better to edit your question instead of writing in the comment section, it isn't readable – cheseaux Mar 28 '17 at 15:12

1 Answers1

5

Your delimiter should be a comma: "separatorChar" = ','

create external table mytable 
(
    col1 string
   ,col2 int
   ,col3 string
)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties 
(
    "separatorChar" = ','
   ,"quoteChar"     = '"'
)  
stored as textfile
;

select * from mytable
;

+--------------+--------------+--------------+
| mytable.col1 | mytable.col2 | mytable.col3 |
+--------------+--------------+--------------+
| SomeName1    |           25 | SomeString1  |
| SomeName2    |           26 | SomeString2  |
| SomeName3    |           27 | SomeString3  |
+--------------+--------------+--------------+
David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88