0

I am new to avro and hive and while learning it i got some confusion. Using

tblproperties('avro.schema.url'='somewhereinHDFS/categories.avsc').

If I run this create command like

create table categories (id Int , dep_Id Int , name String) 
stored as avrofile  
tblproperties('avro.schema.url'=
'hdfs://quickstart.cloudera/user/cloudera/data/retail_avro_avsc/categories.avsc')

but why should i use id Int, dep_Id Int in above command even if i am giving avsc file which contains complete schema.

create table categories stored as avrofile
tblproperties('avro/schema.url'=
'hdfs://quickstart.cloudera/user/cloudera/data/retail_avro_avsc/categories.avsc')

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 
Encountered AvroSerdeException determining schema. 
Returning signal schema to indicate problem: 
Neither avro.schema.literal nor avro.schema.url specified, 
can't determine table schema)

Why does hive need to specify the schema even if the avsc file is present and it already contains the schema?

Jaime Caffarel
  • 2,401
  • 4
  • 30
  • 42
Anaadih.pradeep
  • 2,453
  • 4
  • 18
  • 25

2 Answers2

1

Can you try to do it in this way?

CREATE TABLE categories
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
    'avro.schema.url'='http://schema.avsc');

More info here https://cwiki.apache.org/confluence/display/Hive/AvroSerDe

RadioLog
  • 582
  • 1
  • 4
  • 20
0

Creating an external hive table orders_sqoop from given avro-schema file and avro-data file:

 hive> create external table if not exists orders_sqoop
        stored as avro
        location '/user/hive/warehouse/retail_stage.db/orders'
        tblproperties('avro.schema.url'='/user/hive/warehouse/retail_stage.db/orders_schema/orders.avsc');

The above create table command executes successfully and creates orders_sqoop table.

Validate the table structure below:

hive> show create table orders_sqoop;
OK
CREATE EXTERNAL TABLE `orders_sqoop`(
  `order_id` int COMMENT '', 
  `order_date` bigint COMMENT '', 
  `order_customer_id` int COMMENT '', 
  `order_status` string COMMENT '')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION
  'hdfs://quickstart.cloudera:8020/user/hive/warehouse/retail_stage.db/orders'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='false', 
  'avro.schema.url'='/user/hive/warehouse/retail_stage.db/orders_schema/orders.avsc', 
  'numFiles'='2', 
  'numRows'='-1', 
  'rawDataSize'='-1', 
  'totalSize'='660906', 
  'transient_lastDdlTime'='1563093902')
Time taken: 0.125 seconds, Fetched: 21 row(s)

The above table created as expected.

Farooque
  • 3,616
  • 2
  • 29
  • 41