mapreduce using snappy data order error

Question

I meet a problem when I read snappy file on HDFS to mapreduce.

I have set job.setInputFormatClass(TextInputFormat.class); in job.

Then I get the line value in mapper like this

protected void map(Object key, Text value, Context context) throws IOException, InterruptedException { String[] strs = new String(value.getBytes()).split(String.valueOf(0x09)); LOGGER.info("strs length is " + strs.length); }

If the data is correct, the strs length will be 44. But I found many error length (bigger than 44) in userlogs .

Then I print the value new String(value.getBytes).I found the string printed is not my expected.The line data is not the string I saved to HDFS.

The data's order is error,lead to the wrong value in mapper.

What I can do to solve the problem.

Thanks!

score 0 · Answer 1 · answered Apr 25 '15 at 20:17

You are reading a Snappy Compressed text file but you have mentioned job.setInputFormatClass(TextInputFormat.class) which means that you are going to read a Text file. You have to first decompressed your file to Text file and then you need to run your MR job on top of that file.

mapreduce using snappy data order error

1 Answers1