Reading Values from Hive using Sqoop and Snappy Compression

Question

I am looking to use an alternate way to compress the files for read/write performance, and one of the avenues I have explored is through the use of Snappy compression.

So far, it has been so good, and have been able to get it into HDFS and decompress it using the -text command to see the values. The real issue happens when I try to import the data into hive.

When I import the data into hive, I create a simple external table along with setting the parameters to read Snappy compressed file...

SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;
CREATE EXTERNAL TABLE IF NOT EXISTS test(...
..
)
LOCATION '/user/.../'

When I run SELECT COUNT(*) from test; I get the correct row value; however, if I run SELECT * FROM test limit 100; all I see are NULL values. Why is this happening? Any thoughts?

score 0 · Answer 1 · answered Aug 07 '14 at 12:57

0

In these scenario your mapreduce program generate by hive can't able to find snappy libraries so they are not able to decompress the data.For this try adding snappy.jar in hive auxpath which is available in lib directory of sqoop.Also can you see the logs and configuration of MapReduce program generated by hive for your query to check whether snappy.jar file is loaded in mapreduce.

Setting Hive auxpath require starting hive shell with following parameter: hive --auxpath

Hope these answer you question.

answered Aug 07 '14 at 12:57

Sachin Janani

1,310
1
17
33

Does setting that in the auxpath when starting the hive shell enable that for all users that run hive, or for that one particular session? – theMJof91 Aug 09 '14 at 00:06
It will only enable it for particular session.If you want to enable it for all user you need to do the following: export HIVE_AUX_JARS_PATH= also restart hiveserver with following command: hive --service hiveserver --hiveconf hive.aux.jars.path=file:// – Sachin Janani Aug 09 '14 at 05:53

Reading Values from Hive using Sqoop and Snappy Compression

1 Answers1