5

I have a hive table based on avro schema. The table was created with the following query

CREATE EXTERNAL TABLE datatbl PARTITIONED BY (date String, int time) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'avro.schema.url'='path to schema file on HDFS') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '<path on hdfs>'

So far we have been inserting data into the table by setting the following properties

hive> set hive.exec.compress.output=true; hive> set avro.output.codec=snappy;

However, if someone forgets to set the above two properties the compression is not achieved. I was wondering if there is a way to enforce compression on table itself so that even if the above two properties are not set the data is always compressed?

Vikas Saxena
  • 1,073
  • 1
  • 12
  • 21

1 Answers1

3

Yes, you can set the properties in the table. Try the following:

 CREATE EXTERNAL TABLE datatbl PARTITIONED BY (date String, int time)  
 ROW FORMAT SERDE   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'  
 WITH SERDEPROPERTIES (   'avro.schema.url'='path to schema file on
 HDFS')   STORED as INPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  
 OUTPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION
 '<path on hdfs>'
 TBLPROPERTIES ( "orc.compress"="SNAPPY" );
dbustosp
  • 4,208
  • 25
  • 46
  • @VikasSaxena Yes! You can do that settings the same: STORED AS ORC tblproperties ("orc.compress"="ZLIB") even though it is a ORC table. Take a look at this: https://hadoopist.wordpress.com/2015/01/03/how-to-create-orc-tables-in-hive-an-analysis/ – dbustosp Mar 28 '17 at 22:43