0

I meet a problem when I read snappy file on HDFS to mapreduce.

I have set job.setInputFormatClass(TextInputFormat.class); in job.

Then I get the line value in mapper like this

protected void map(Object key, Text value, Context context) throws IOException, InterruptedException { String[] strs = new String(value.getBytes()).split(String.valueOf(0x09)); LOGGER.info("strs length is " + strs.length); }

If the data is correct, the strs length will be 44. But I found many error length (bigger than 44) in userlogs .

Then I print the value new String(value.getBytes).I found the string printed is not my expected.The line data is not the string I saved to HDFS.

The data's order is error,lead to the wrong value in mapper.

What I can do to solve the problem.

Thanks!

huyang
  • 1
  • 3

1 Answers1

0

You are reading a Snappy Compressed text file but you have mentioned job.setInputFormatClass(TextInputFormat.class) which means that you are going to read a Text file. You have to first decompressed your file to Text file and then you need to run your MR job on top of that file.

salmanbw
  • 1,301
  • 2
  • 17
  • 23