0

I am trying to create an external hive table on existing avro files. Below is the query.

CREATE EXTERNAL TABLE sample
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
location '/user/sshusr/sample/'
TBLPROPERTIES ('avro.schema.url'='/user/sshusr/avsc_files/sample.avsc');

The table got created and I can see the data using simple SELECT queries. But, few columns in avro can have line breakers. For e.g., comments column data can have paragraphs (with new line characters). Due to this the data is not loaded properly on to the table (wherever the avro serde encounters an new line character inside a column, it is treating it as next record/row). I couldn't find any examples on internet. Is there any workaround to handle this situation?

Thanks in advance.

E_net4
  • 27,810
  • 13
  • 101
  • 139
  • are you using sqoop import to get this '/user/sshusr/avsc_files/sample.avsc' data?? if so you can use --hive-delims-replacement "" as one of the parameters while importing. – sk7979 Nov 06 '17 at 12:56
  • Hi, Thank you for the reply. The `avsc` file that I used is for loading the table schema. There is no problem with the `avsc` file. Only problem is with the actual table data. The data in a cell contains new line characters. So, the serde is not able to deserialize the avro properly. – Sri Harsha Chennavajjala Nov 07 '17 at 05:09

1 Answers1

0

It is already fixed in the Hive version 2.0.0. Before that, the only workaround before that is replace the new lines characters for something else during the select statement.

hlagos
  • 7,690
  • 3
  • 23
  • 41