I am looking to use an alternate way to compress the files for read/write performance, and one of the avenues I have explored is through the use of Snappy compression.
So far, it has been so good, and have been able to get it into HDFS and decompress it using the -text command to see the values. The real issue happens when I try to import the data into hive.
When I import the data into hive, I create a simple external table along with setting the parameters to read Snappy compressed file...
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;
CREATE EXTERNAL TABLE IF NOT EXISTS test(...
..
)
LOCATION '/user/.../'
When I run SELECT COUNT(*) from test; I get the correct row value; however, if I run SELECT * FROM test limit 100; all I see are NULL values. Why is this happening? Any thoughts?