I am new to avro and hive and while learning it i got some confusion. Using
tblproperties('avro.schema.url'='somewhereinHDFS/categories.avsc')
.
If I run this create
command like
create table categories (id Int , dep_Id Int , name String)
stored as avrofile
tblproperties('avro.schema.url'=
'hdfs://quickstart.cloudera/user/cloudera/data/retail_avro_avsc/categories.avsc')
but why should i use id Int, dep_Id Int
in above command even if i am giving avsc
file which contains complete schema.
create table categories stored as avrofile
tblproperties('avro/schema.url'=
'hdfs://quickstart.cloudera/user/cloudera/data/retail_avro_avsc/categories.avsc')
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException
Encountered AvroSerdeException determining schema.
Returning signal schema to indicate problem:
Neither avro.schema.literal nor avro.schema.url specified,
can't determine table schema)
Why does hive need to specify the schema even if the avsc
file is present and it already contains the schema?